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1.  Summary 

Mobility  is  a  serious  limiting  faetor  in  the  usefulness  of  unmanned  ground  vehieles.  This 
paper  eontains  a  deseription  of  our  approaeh  to  develop  eontrol  algorithms  for  the  Novel 
Unmanned  Ground  Vehiele  (NUGV)  to  address  this  problem.  The  NUGV  is  a  six- 
degree-of-freedom,  sensor-rieh  small  mobile  robot  designed  to  demonstrate  auto-learning 
eapabilities  for  the  improvement  of  mobility  through  variegated  terrain.  The  learning 
proeesses  we  plan  to  implement  are  eomposed  of  elassieal  and  operant  eonditionings  of 
novel  responses  built  upon  pre-de fined  fixed  aetion  patterns.  The  fixed  aetion  patterns 
will  be  in  turn  modulated  by  pre-defined  low-level  reaetive  behaviors  that,  as 
uneonditioned  responses,  should  eontinuously  serve  to  maintain  the  viability  of  the  robot 
during  the  aetivations  of  the  fixed  aetion  patterns  and  of  the  higher-order  (eonditioned) 
behaviors.  The  sensors  of  the  internal  environment  that  govern  the  low-level  reaetive 
behaviors  also  serve  as  the  eriteria  for  operant  eonditioning.  Using  this  adaptive 
eontroller,  the  NUGV  should  learn  to  negotiate  diffieult  obstaeles,  and  to  proteet  itself 
from  eollisions  and  falls. 


*  At  the  time  of  this  writing,  the  NUGV  is  in  the  final  stages  of  detail  design  and  prototyping  by  Automated 
Controlled  Environments,  Inc.,  25133  Avenue  Tibbitts,  Unit  A,  Valencia,  CA  91355,  (661)  775-7754  Fax: 
(661)  775-7770,  under  contract  N66001-02-M-X105,  with  support  from  the  Office  of  the  Secretary  of 
Defense  Joint  Robotics  Program  (JRP).  The  author  gratefully  appreciafes  the  support  of  the  JRP 
Coordinator,  and  the  assistance  of  ACEi  in  the  preparation  of  this  manuscript. 
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2.  Objectives 

2.1.  To  What  Should  We  Aspire? 

The  standard  to  whieh  we  should  aspire  in  the  eontrol  proeesses  of  our  robots  is  not  an 
indolent,  ineffeetual,  and  operator’s  attention  eonsuming  automaton,  but  rather  a  mobile, 
self-suffieient,  loyal,  eooperative  and  obedient  agent,  somewhat  like  a  hundred-pound 
Golden  Retriever. 

2.2.  Why  Should  We  Aspire  to  This? 

We  eoneeive  of  our  robot  as  an  aid  to  the  operator,  not  the  other  way  around.  Thus  our 
robot  should  be  there  when  the  operator  needs  it,  ready  to  assist.  Otherwise,  the  robot 
should  stay  out  of  the  way,  and  take  eare  of  itself 

2.3.  Is  This  Not  Science  Fiction? 

This  will  not  be  seienee  fiction  if  we  define  carefully  the  needs  of  the  robot  and  install 
low-level  control  process  on  the  robot  to  provide  for  these  needs.  Second,  if  we  couple 
one  or  more  of  the  solutions  to  the  critical  needs  of  the  robot  with  some  activity  of  its 
operators,  the  probability  that  the  robot  will  track,  trail,  and  learn  to  cooperate  with  its 
human  operators  should  be  increased. 

2.4.  Resolving  the  Conflict 

The  reader  may  sense  a  contradiction  here.  I  suggest  above  that  the  robot  must  have  low- 
level  control  processes  that  permit  it  to  take  care  of  itself,  while  at  the  same  time  state 
that  these  must  be  coupled  to  activities  of  the  human  operator  so  that  the  robot  is  in  some 
way  dependent  upon  that  operator.  Can  we  have  both  independence  and  dependence,  or 
self-interest  coincident  with  social-interest?  Can  the  robot  exercise  independence  by 
virtue  of  its  low-level  control  processes,  and  then  become  dependent  upon  a  human 
operator  through  the  acquisition  of  higher  order  robot  behaviors  that  also  provide  service 
to  the  operator?  In  the  following,  I  will  attempt  to  explain  how  we  can.  Resolving  this 
conflict  between  self-interest  and  social-interest  should  provide  for  the  usefulness  of  the 
robot^. 


^  Many  of  the  terms  that  I  will  use  in  this  discussion  come  from  biology  and  psychology.  They  are  thus 
loaded  with  anthropomorphic  connotations.  I  hope  that  the  reader  does  not  become  too  suspicious  at  this 
point,  but  looks  for  my  later  description  of  algorithms  that  will  implement  these  concepts  in  the  artificial 
system  of  robot  hardware  and  software. 
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3.  Control  with  Independent  Agents 

To  achieve  our  objectives,  we  will  implement  a  control  architecture  that  differs 
significantly  from  the  principal  approach  taken  today  in  mobile  robotics.  First,  we  view 
our  robot  as  an  independent  agent,  and  will  attempt  to  endow  it  with  all  of  the  necessary 
capabilities  to  promote  its  own  welfare.  Second,  we  view  our  own  role  in  the  process 
more  as  director  and  collaborator,  than  as  user  and  operator,  and,  as  such,  will  employ 
methods  of  control  that  involve  more  the  leash  than  the  lever. 

3.1.  Control  by  Negation 

To  the  degree  that  the  robot  will  be  self-controlled  it  will  also  be  self-motivated.  Then,  as 
it  is  self-motivated,  the  operator^  may  be  excluded  from  giving  explicit  instructions  on 
the  direction  and  intensity  of  any  robot  action.  Rather,  the  director  (or  human 
collaborator)  should  be  able  to  provide  information  on  the  intended  objective,  to  which 
the  robot  would  then  be  socially  motivated  to  pursue.  The  director  may  then  observe  the 
progress  of  the  robot  toward  that  objective,  and  intervene  only  as  necessary  to  veto  or 
negate  a  particular  action  that  the  robot  is  attempting  to  execute.  Once  an  action  is 
negated,  the  robot  would,  on  its  own  initiative,  select  a  different  approach  to  the 
objective"^. 

3.2.  The  Purpose  of  Local  Control  is  Preservation 

An  agent  is  useful  only  while  it  is  viable.  An  agent’s  viability  is  preserved  when  it 
remains  physically  intact,  its  sensors  and  effectors  function  as  designed,  and  its  energy 
reserves  are  adequate  for  any  exigencies.  Factors  that  jeopardize  these  conditions  are 
variously  extremes  of  temperature,  shock  and  other  collisions,  and  un-replenished  power 
consumption. 

3.3.  Homeostasis  is  an  Optimal  State  for  Preservation 

Homeostasis  is  the  state  of  the  agent  that  optimally  predisposes  it  to  perform  some 
additional  activities  within  its  present  environment^.  Thus  an  agent  preserves  itself  by 
performing  activities  that  maintain  its  homeostasis  and  by  avoiding  actions  that  seriously 
disturb  its  homeostasis.  Various  internal  sensors  measure  the  state  of  the  agent,  defining 
its  homeostasis.  At  a  very  low  level  of  control,  these  sensors  are  coupled  with  subsystems 
that  enable  the  activity  of  the  agent.  When  a  subsystem  is  failing,  the  agent’s  activity  is 
threatened,  and  some  change  in  activity  should  occur  to  restore  the  subsystem 
functionality,  in  other  words,  to  restore  homeostasis. 


^  The  term  operator  is  ineonsistent  with  the  eontrol  of  an  independent  agent.  We  operate  maehines,  and 
politieal  operatives  are  defined  by  their  ability  to  operate  politieians,  but  we  direet  aetors  and  employees. 

Negation  will  be  effeetive  only  to  the  degree  that  the  operator  ean  both  tempt  and  threaten  the  robot,  and 
to  the  degree  that  the  robot  ean  generate  alternative  aetions  to  aehieve  the  intended  objeetive.  We  will 
diseuss  these  possibilities  later  in  this  paper. 

^  The  permission  of  additional  behavior  is  also  known  as  survival. 
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4.  A  Control  Architecture  for  Independence  and  Survival 

The  eontrol  processes  (algorithms)  for  our  robot  must  execute  within  the  constraints 
imposed  upon  it  by  our  mechanical  design,  sensors,  electronics,  and  a  few  (very  few) 
behavioral  preferences.  All  of  these  things  we  provide  to  the  robot  in  assembly,  and  are 
analogous  to  the  ontological  implementation  of  a  genetic  code. 

4.1.  The  Physical  Constraints 

Developing  a  local  control  capability  through  the  use  of  artificial  intelligence  (AI) 
algorithms  should  prove  feasible  in  an  embodied  system  such  as  our  Novel  UGV.  The 
physical  system  of  the  Novel  UGV  provides  not  only  constraints,  but  also  a  means  to 
complete  feedback  loops  with  the  environment  that  is  essential  for  stability. 

The  physical  equipment  of  our  robot,  that  will  enable  and  constrain  our  AI  algorithms,  is 
shown  in  Figures  1,  2  and  3.  The  Novel  UGV  is  composed  of  three  principal  segments,  a 
central  core,  and  two  pods.  All  three  segments  contain  electrical  power,  power 
transmission  mechanisms,  sensors  for  both  the  internal  and  external  environments,  radios 
for  inter-pod  communication,  and  electronics  for  local  processing.  The  core  contains 
radios  for  communication  with  the  operator  control  unit  (OCU). 

The  two  pods  are  tracked  for  conventional  tank-type  motion  across  planar  surfaces.  The 
pods  are  each  connected  to  the  central  core  by  a  single  axle,  about  which  the  pods  can 
rotate.  These  two  axles  are  mounted  at  either  end  of  the  core,  and  laterally  near  the  end  of 
each  pod.  The  axles,  with  pods  attached,  can  rotate  about  the  ends  of  the  core. 


Figure  1,  Outer  appearance  of  the  Novel  UGV 
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The  NUGV  is  symmetrieal  on  all  major  axes,  so  that  if  the  image  in  Figure  1  was  rotated 
180  degrees  in  any  direction,  it  would  appear  the  same.  Sensors  for  the  external 
environment  (video  cameras,  SONAR,  and  IR  proximity  detectors)  are  located  on  both 
ends  of  the  core  faceplates,  and  (sans  video  cameras)  on  the  outboard  sides  of  the  two 
pods. 


Figure  2,  Exterior  dimensions  of  the  Novel  UGV  in  inches. 

The  total  weight  of  the  first  prototype  Novel  UGV  should  be  approximately  30  pounds. 
The  use  of  lighter  materials  in  its  construction  should  reduce  this  weight  by  about  30%. 
The  vehicle  may  scale  upwards  to  increase  payload  and  energy  storage  capacities. 
Downward  scalability  will  be  limited  by  the  availability  of  suitably  scaled  electronics, 
energy  transmissions,  sensors,  and  energy  density  storage  or  recovery  devices.  Recent 
developments  in  micro-electromechanical  systems  (MEMS)  promise  to  significantly  push 
back  limitations  to  the  first  three,  but  micro  energy  storage  or  recovery  issues  are  yet  to 
be  addressed. 
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Figure  3,  Cutaway  of  the  Novel  UGV  showing  components. 

The  layout  of  some  of  the  internal  eomponents  of  the  NUGV  ean  be  seen  in  Figure  3. 
Batteries  are  represented  by  gray  eylinders,  cireuit  board  are  shown  in  light  green,  and 
power  transmission  deviees  are  shown  as  black  cylinders  and  belts.  Many  wires,  cables, 
and  other  obscuring  components  are  not  shown  for  clarity. 

4.2.  Multiple  Degrees  of  Motion  Freedom  Allow  Multiple  Conformations 

The  physical  architecture  of  our  robot  permits  it  to  assume  several  different 
conformations.  Our  physical  architecture  enables  six  mobility  degrees  of  freedom^.  For 
comparison,  the  Foster-Miller  Talon  tracked  robot  has  two,  the  '\Kohoi  PackBot  tracked 
robot  with  flipper  assist  has  three,  and  the  Sony  humanoid  robot  has  twenty- 

eight,  more  or  less.  A  sample  of  the  different  conformations  that  are  possible  with  the 
Novel  UGV’s  six  degrees  of  freedom  is  given  in  Figure  4.  The  variable  conformation  of 
the  vehicle  permits  a  large  diversity  of  behavioral  responses  to  environmental  conditions. 
In  general  the  degree  of  behavioral  complexity  possible  in  a  mobile  agent  is  a  non-linear 
function  of  the  mobility  degrees  of  freedom. 

Each  of  the  conformations  depicted  in  Figure  4  can  be  achieved  or  passed  through  by  a 
variety  of  combinations  of  pod  motions. 


^  The  simultaneous  remote  eontrol  of  six  mobility  degrees  of  freedom  would  pose  a  signifieant  ehallenge 
for  a  human  operator.  For  this  and  other  reasons  we  intend  to  automate  mueh  of  the  loeal  eontrol  proeesses. 
As  we  expand  the  number  of  mobility  degrees  of  freedom  in  order  to  inerease  the  opportunity  for  inereased 
behavioral  eomplexity  in  our  development  of  even  more  eapable  robots,  this  loeal  automation  will  take  on 
even  greater  importanee. 
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Figure  4,  Various  possible  robot  conformations. 


Of  the  robot’s  six  degrees  of  freedom  of  movement,  the  two  degrees  of  freedom 
associated  with  the  camber  axes  are  limited  in  transit,  while  the  other  four  degrees  of 
freedom  are  rotationally  continuous. 

Each  conformation  shown  in  Figure  4  will  have  a  different  utility  for  one  of  the  different 
topologies  of  the  surfaces  over  which  the  robot  will  attempt  to  move.  Because  the  robot  is 
symmetrical  along  its  lateral  (X,  side  to  side),  coronal  (Y,  top  to  bottom),  and  sagittal  (Z, 
end  to  end)  axes,  there  will  always  be  two  absolute  conformations  with  respect  to  gravity 
that  will  accomplish  the  same  task  in  the  same  way. 

Given  a  planar  surface  with  small  physical  texture  relative  to  the  vehicle,  the  most 
efficient  conformation  of  the  robot  is  expected  to  be  that  of  Figure  4. a.  The  vehicle  is 
most  stable  in  this  conformation  as  the  maximum  amount  of  track  contact  with  the  planar 
surface  is  possible  and  the  vehicle  has  the  lowest  center  of  gravity.  From  this 
conformation  the  vehicle  could  execute  turns  by  skid  steering  wherein  the  track  velocities 
are  varied  between  the  pods  to  rotate  the  vehicle  while  in  place  or  while  progressing. 
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The  eonformation  shown  in  Figure  4.b,  the  open  position,  eould  be  most  useful  when  a 
high  barrier  must  be  sealed,  or  when  a  narrow  ehasm  or  gulf  (negative  obstaele)  with  a 
width  not  in  exeess  of  the  length  of  one  pod  must  be  erossed. 

The  eonformation  in  Figure  4.e  eould  be  useful  for  elevating  the  eameras  for  improved 
perspeetive,  and  for  passing  over  oeeasional  obstaele  elumps. 

The  eonformations  of  Figures  4.d  and  4.f  eould  permit  the  vehiele  to  maintain  stable 
traetion  on  irregular  surfaees  sueh  as  beams,  tree  branehes,  gabled  rooftops,  and  pipes 
(inside  or  out).  This  eonformation  would  also  permit  the  vehiele  to  avoid  high  eentering 
on  boulders  and  other  irregularities  in  the  plane  of  traversal. 

The  eonformation  in  Figure  4.e  represents  the  pose  the  robot  might  take  in  approaehing  a 
step  ehange  on  a  planar  surfaee. 

The  ehoiee  of  eonformations  for  any  set  of  environmental  eonditions  would  have  to 
depend  upon  the  robot’s  ability  to  assess  those  eonditions,  and  reeall  previous 
eonformations  that  aeeomplished  a  task  objeetive  and  met  the  optimization  eriteria. 

A  seeond  problem  is  the  morphing  from  one  eonformation  to  the  more  optimal 
eonformation  without  losing  frietion  or  balanee.  I  will  address  these  problems 
progressively  through  the  paper. 

4.3.  Information  Flow  During  Control 

The  different  eomponents  of  the  physieal  arehiteeture  fall  within  the  following  elasses: 

■  Sensors  of  the  internal  environment 

■  Sensors  of  the  external  environment 

■  Effeetors  eomposed  of  motors  and  transmission  elements 

■  Energy  storage  eomposed  of  batteries 

■  Computational  resourees 

The  eomputational  resourees  provide  the  substrate  for  eonneetivity  matriees  between 
sensors  and  effeetors.  Theses  matriees  are  eomposed  of  fixed  and  plastie  elements 

These  eomponents  are  graphieally  shown  in  Eigure  5.  The  arrows  of  Eigure  5  indieate  the 
direetion  of  information  flow.  The  eontrol  laws  are  embedded  in  the  two  boxes  labeled 
“fixed  eonneetions”  and  “plastie  eonneetions”.  The  fixed  eonneetions  are  established 
primarily  by  design,  while  the  plastie  eonneetions  are  established  primarily  through  the 
vehiele’s  experienee  in  operation,  though  based  upon  pre-defined  meehanisms.  Eeedbaek 
is  indieated  in  the  horizontal  arrows  between  the  boxes  of  eonneetions,  and  in  the  line 
through  the  environment  that  provides  information  on  the  physieal  eonsequenees  of  the 
robot’s  behavior. 
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\  feedback 


from  the  environment 


Figure  5,  Schematic  Architecture  of  the  Novel  UGV  Control  System 


4,4,  Representation  of  the  Control  Output 

Since  our  Novel  UGV  is  symmetrical  on  all  axes,  we  can  define  top  vs.  bottom,  left  vs. 
right,  and  front  vs.  back  only  with  respect  to  gravity  and  to  the  direction  in  which  the 
vehicle  is  moving.  The  six  motors  therefore  can  have  an  absolute  identification  and  a 
relative  identification.  For  most  of  our  discussion  I  will  use  the  relative  identification, 
recognizing  that  the  core  sensors  for  gravity  and  direction  of  motion  will  have  to  route 
the  motor  commands  (M)  to  the  appropriate  motors  in  the  appropriate  way  to  execute  the 
desired  action. 

Each  of  our  six  motors  can  turn  in  either  direction.  We  represent  this  by  12  output 
elements.  The  torque  on  the  motors  will  be  proportional  to  applied  voltage.  We  represent 
the  applied  voltage  by  the  numerical  value  on  the  output  element.  Thus  we  have  the 
following  elements  in  our  motor  vector  (M)^: 

CL,  for  camber  left  pod, 

CLx,  for  camber  left  pod  counter  clockwise, 

CR,  for  camber  right  pod, 

CRx,  for  camber  right  pod  counter  clockwise, 

RL,  for  rotation  of  left  pod, 

RLx,  for  rotation  of  left  pod  counter  clockwise, 

RR,  for  rotation  of  right  pod. 


^  Throughout  this  paper,  I  will  indicate  vector  variables  by  bold  type,  and  scalar  variables  in  regular  type. 
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RRx,  for  rotation  of  right  pod  counter  clockwise, 

XL,  for  track  rotation  of  left  pod, 

TLx,  for  track  rotation  of  left  pod  counter  clockwise, 

TR,  track  rotation  of  right  pod,  and 

TRx,  track  rotation  of  right  pod  counter  clockwise. 

Where  x  is  always  a  counter  clockwise  rotation  from  the  perspective  of  the  vehicle.  In 
general,  to  get  the  track  pods  coordinated  in  the  direction  of  travel  of  the  vehicle,  one  pod 
must  move  clockwise  while  the  other  pod  moves  counter  clockwise. 

As  it  is  impossible  for  any  one  of  the  motors  to  turn  in  both  directions  at  the  same  time, 
we  should  provide  for  contradictory  commands  to  the  same  motor  to  cancel  at  the  output 
element.  For  example; 


Mcl  ~  CL  —  CLx 


4.5.  Sensors  of  the  Internal  Environment 

We  have  a  sensor  field  composed  of  numerous  sensors  of  the  internal  environment.  These 
include  nine  accelerometers  (three  for  each  of  two  pods,  and  three  for  the  core),  three 
core  magnetometers,  four  track  rotation  sensors  (two  per  pod),  sixteen  touch  sensitive 
whiskers  (four  on  the  ends  of  each  of  the  pods),  two  core  faceplate  pressure  sensors  (one 
at  each  end  of  the  core),  eighty  plate  pressure  sensors,  three  battery  voltage  sensors  (one 
in  each  compartment),  and  three  battery  current  sensors. 

The  pod  plate  pressure  sensors,  the  touch-sensitive  whiskers,  and  the  core  faceplate 
pressure  sensors  would  not  ordinarily  be  considered  sensors  for  the  internal  environment, 
but  we  include  them  here  because  they  basically  require  physical  contact  with  an  external 
object  to  produce  an  output.  Thus  they  are  neither  predictive  of  contact,  nor  descriptive  of 
the  typology  of  the  immediate  environment. 

A  vector  of  features,  derived  from  the  nine  accelerometers,  defines  the  conformation  of 
the  vehicle  (C).  By  measuring  the  acceleration  vector  in  each  trio  of  accelerometers  and 
comparing  the  measurements  to  each  other,  the  relationship  (tilt  and  camber)  of  the  pods 
to  the  core  can  be  determined.  When  the  motors  are  all  quiescent,  gravity  is  the  only 
influence  on  the  accelerometers,  and  the  accelerometer  input  is  sufficient  for  an 
unambiguous  determination  of  conformation.  Some  examples  of  this  vector  are  shown  in 
Figure  6. 

When  the  pods  are  in  motion  with  respect  to  the  core,  the  pod  accelerometers  will  sense 
both  the  pod  motion  acceleration  and  gravity.  The  two  effects  will  be  confounded.  The 
conformation  will  be  changing  during  these  motion-imposed  accelerations.  The  effects  of 
motion  acceleration  could  be  extracted  from  the  effects  of  gravity  as  there  is  a  very 
predictable  effect  on  the  accelerometers  with  the  different  changes  in  axle  position  due  to 
activations  of  the  motors.  However,  we  yet  do  not  have  sensors  for  axle  rotation.  Thus 
there  will  remain  some  conformation  uncertainty  until  the  pod  and  camber  rotations  stop. 
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Flat  Surface 
All  Pods  “stowed” 


Flat  Surface 

Center  Pod  raised  to  avoid  obstacle 


Figure  6,  Accelerometer  indicators  of  conformation. 

The  robot  would  compare  information  from  its  internal  sensors  to  determine  its  present 
conformation,  and  to  assess  the  success  of  any  attempts  to  change  its  conformation,  but 
the  control  algorithms  have  available  information  from  all  sensors  at  all  times,  some  of 
which  may  be  irrelevant  to  the  particular  control  decision,  in  this  case  -  conformation, 
but  which  later  may  become  a  disambiguifying  factor.  For  example,  during  changes  in 
conformation,  the  pod  plate  pressure  sensors  and  the  whiskers  will  cooperate  with  the 
accelerometers  to  determine  whether  the  robot’s  contacts  are  due  either  to  the  ground 
plane,  to  an  obstacle,  or  to  an  appropriate  leverage  point. 

4.6.  Fixed  Action  Patterns  (FAP) 

To  control  the  six  degrees  of  freedom  during  translation,  and  during  the  transition  from 
one  conformation  to  the  next,  the  robot  will  likely  need  several  different  behaviors 
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composed  of  sets  of  coordinated  motor  commands.  Similar  organized  behaviors,  specific 
to  the  physical  makeup  of  an  animal,  and  stereotypical  in  nature,  are  called  Fixed  Action 
Patterns  (FAP)  in  the  Neuroethology  community.  I  will  use  that  term  here  as  well.  The 
robot’s  Fixed  Action  Patterns  exhibit  a  predictable  set  of  events  characterized  by 
coordinated  motor  torques  and  timing.  The  Fixed  Action  Patterns  do  not  necessarily 
depend  upon  any  particular  environmental  conditions,  but  may  be  invoked  by  triggers 
related  to  the  above  sensors  of  the  internal  environment. 

A  network  of  delay  elements  that  can  be  invoked  as  a  unit  will  define  each  of  the  six 
FAP.  The  connectivity  of  the  elements  in  those  units  will  define  the  sequence  and 
strength  of  commands  to  the  12  output  elements.  The  sub-networks  that  manage  the 
different  FAP  are  located  in  the  box  labeled  fixed  connections  in  Figure  5.  The  FAP 
progress  by  the  strength  of  the  recent  history  of  current  pattern  to  evoke  the  next  element 
of  the  pattern.  Thus,  baring  any  changes  in  the  external  and  internal  environments,  a 
pattern,  once  initiated,  may  continue  in  an  infinite  loop.  The  impossibility  of  an  infinite 
behavioral  loop,  however,  is  obvious,  as  behavior  itself  will  produce  changes  in  both 
environments,  disrupting  the  behavior. 

4.6.1.  Fixed  Action  Pattern  P.  Porpoising 

FAP-P  may  be  attempted  when  the  robot  is  fully  immersed  in  a  liquid  medium  and  is 
neutrally  buoyant.  Immersion  would  be  sensed  with  the  present  sensor  suite  by  the 
absence  of  contact  information  from  any  of  the  whiskers  or  plate  pressure  sensors.  Under 
these  conditions,  the  rearward  track  pod  would  assume  a  position  180  degrees  to  the  rear 
and  oscillate,  while  the  forward  track  pod  would  maintain  its  normal  position  with  respect 
to  the  core  and  then  oscillate  in  counter  phase  with  the  rearward  track  pod.  The  net  result 
of  the  oscillations  of  the  two  pods  should  be  a  porpoising  of  the  robot  through  the  liquid 
medium.  Diving  and  surfacing  could  be  accomplished  by  varying  the  angles  of  the 
forward  and  rearward  pods  around  which  the  oscillations  are  made. 

4.6.2.  Fixed  Action  Pattern  R.  Resting  to  Running 

FAP-R  permits  the  robot  to  run  consistently  and  rapidly  in  a  particular  direction  on  a 
smooth  planar  surface.  This  FAP  prefers  the  conformation  shown  in  Figure  4. a.  Sensor 
conditions  that  would  favor  this  FAP  are  significant  pod  plate  pressure,  and  the  absence 
of  whisker  contact.  To  achieve  this  conformation,  the  robot  assesses  its  core 
accelerometer  values.  If  the  Y-axis  (see  Figure  7)  is  at  +/-  1,  the  core  is  horizontal  on 
whatever  surface  the  robot  is  resting.  The  robot  then  attempts  to  match  the  Y-axes  of  the 
two  pods  with  the  core  Y -axis  value  by  rotating  the  pods  away  from  their  contact  points 
without  upsetting  the  Y-axis  value.  A  stable  surface  would  permit  this  maneuver  and  the 
robot  could  then  close  to  its  normal  preferred  conformation. 

To  execute  a  run  command  from  the  normal  closed  position,  the  simplest  mechanism 
would  be  to  have  the  two  track  motors  run  essentially  at  the  same  speed.  Speed  control 
may  be  modulated  by  accelerometers  on  the  X  and  Z-axes. 
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Changes  in  direetion  of  travel  could  reactively  occur  in  response  to  the  asymmetric 
detection  of  obstacles  along  the  trajectory.  The  sensor  detecting  the  obstacle,  activation  of 
pod  whiskers  for  example,  could  trigger  a  turn  away  from  the  obstacle  by  biasing  the  X- 
axis  accelerometer  output  to  the  motor  controllers.  The  robot  could  turn  most  efficiently 
by  lifting  the  ends  of  one  or  both  pods  off  of  the  surface  during  the  turn.  Which  end  is 
lifted  could  depend  upon  the  desired  direction  of  the  turn. 


If  the  robot  was  positively  or  negatively  (but  not  neutrally  buoyant)  the  robot  could  use 
FAP-R  to  swim  on  the  surface  of  a  liquid  medium  or  crawl  on  the  bottom  of  its  container 
respectively. 


+Y 


Figure  7,  Vehicle  Motion  Axes 

4. 6. 3.  Fixed  Action  Pattern  S.  Scaling 

FAP-S  permits  the  robot  to  scale  a  large  non-vertical  obstacle  by  a  combination  of 
walking  and  running.  The  robot  would  normally  initiate  the  FAP-S  by  encountering  an 
obstacle  with  its  whisker  sensors.  To  accomplish  scaling,  the  robot  could  rotate  its  two 
pods  outward  from  the  normal  closed  conformation  until  contact  is  reestablished  on  the 
pressure  plates.  If  the  forward  pod  is  on  the  right  of  the  vehicle,  the  rotation  of  both  pods 
would  be  counter-clockwise,  while  the  reverse  direction  of  rotation  would  be  performed 
if  the  vehicle  was  inverted  at  the  time  of  first  contact.  During  rotation,  the  forward 
tracked  pod  would  normally  make  contact  with  the  obstacle  before  the  rearward  pod 
again  made  contact  with  the  ground  plane,  and  the  robot  would  pull  itself  up  on  the 
obstacle  using  a  combination  of  its  track  tread  rotations  and  forward  track  pod  rotation.  If 
the  obstacle  was  short,  the  rotation  could  continue  and  the  robot  would  pull  itself  over  the 
obstacle.  Any  unevenness  of  the  obstacle,  such  as  a  staircase,  could  cause  the  forward 
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pod  to  continue  in  its  rotation  (as  it  is  yet  leading  and  yet  in  the  open  conformation  so  that 
the  same  direction  of  rotation  would  be  maintained,  causing  the  pod  to  complete  a  full 
rotation),  and  the  rearward  pod  to  oscillate  or  porpoise  as  the  pod  attempts  to  improve  its 
traction  with  the  obstacle. 

The  robot  could  perform  a  descent  down  a  slope  by  maintaining  the  same  pattern  as  was 
used  in  the  ascent. 

A  similar  combination  of  walking  and  porpoising  could  also  be  used  to  propel  the  robot 
across  the  surface  of  a  liquid  in  which  it  was  positively  buoyant. 

4.6.4.  Fixed  Action  Pattern  T.  Tumbling 

FAP-T  permits  the  robot  to  tumble  by  alternately  rotating  the  pods  around  the  core  in  a 
consistent  direction.  One  use  for  tumbling  could  be  to  dismount  from  a  straddle  position 
on  a  beam.  A  conceivable  trigger  for  this  FAP  could  be  the  absence  of  forward  and 
rearward  motions  by  any  other  FAP.  The  tumbling  could  be  performed  most  efficiently 
from  the  normal  closed  position  (Figure  4. a).  To  initiate  tumbling,  one  pod  on  the  side  to 
which  tumbling  would  progress  would  begin  a  rotation  under  the  core.  After  a  lag,  the 
second  pod  would  begin  its  rotation  under  the  core.  This  would  tend  to  bring  the  core 
over  the  pod  with  the  first  rotation.  Next,  after  completing  its  range  of  rotation,  the 
direction  of  rotation  would  change  on  the  first  pod,  while  the  second  pod  would  continue 
with  its  rotation  progress  under  the  core  while  the  core  was  being  lifted  away  from  the 
first  pod.  Upon  completion  of  its  rotation  transit,  the  second  pod  would  also  reverse  its 
direction  of  rotation  and  move  to  complete  the  inversion  of  the  platform.  As  either  pod 
reached  the  limits  of  rotation  in  either  direction  it  would  change  direction  and  repeat  the 
process.  In  this  way,  the  tumbling  could  be  completed.  Alternatively,  the  rates  of  rotation 
could  differ,  with  the  pod  moving  faster  initially  in  the  direction  of  the  tumble.  The  rate 
as  well  as  the  direction  of  rotation  could  alternate  at  each  range  limit.  The  pods  could  also 
rotate  on  their  connecting  arms  to  facilitate  tumbling  by  moving  the  center  of  gravity 
further  away  from  the  core. 

4.6.5.  Fixed  Action  Pattern  U.  Undulating. 

FAP-U  permits  the  robot  to  elevate  its  core  above  the  terrain  without  moving  forward.  A 
conceivable  trigger  for  this  behavioral  pattern  could  be  the  detection  of  low  battery 
capacity.  An  elevated  core  might  make  the  robot  easier  to  find.  Other  triggers  could 
include  loss  of  RF  signal,  and  SONAR  indication  of  a  blocked  visual  field.  Thus 
elevating  the  core  could  also  improve  radio  communications,  and  it  could  give  the  robot’s 
video  cameras  a  better  perspective  above  ground  rubble.  Accelerometers  would  provide 
the  primary  sensor  input  during  the  execution  of  this  FAP.  Undulation  could  begin  from 
the  normal  closed  position  by  rotating  both  pods  outward.  Undulation  could  proceed  by 
continuing  the  rotation  until  the  core  ascends  to  its  apogee  and  begins  again  to  descend. 
The  undulation  could  be  halted  at  this  point  whereupon  the  core  would  be  at  its  most 
elevated  position  with  respect  to  the  ground  plane.  Because  of  the  wide  tracks,  the  robot 
should  be  stable  in  this  position,  but  during  movement,  stability  could  be  achieved  either 
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by  adjusting  the  rotation  of  either  pod  or  by  adjusting  the  direetion  and  rate  of  the  pod 
traek  rotations,  or  both.  Continuing  the  undulation  would  involve  a  reversal  of  the  pod 
rotations  at  this  point.  At  the  point  of  eo-pod  rotation  where  eore  elevation  no  longer 
ehanges,  the  direetion  of  pod  rotation  would  again  ehange  lifting  the  eore  again  to  its 
apogee. 

From  the  eore  perigee,  eontinuing  the  pod  rotations  in  the  same  direetion  would  restore 
the  robot  eonformation  to  the  normal  elosed  position. 

To  aehieve  an  extended  position,  useful  on  steep  slopes,  the  rotation  eould  be  interrupted 
as  the  eore  begins  to  lift  from  the  surfaee  during  pod  rotation. 

4.6.6.  Fixed  Action  Pattern  W.  Walking 

FAP-W  permits  the  robot  to  walk  eonsistently  in  a  partieular  direetion  on  a  variegated* 
planar  surfaee.  In  this  pattern,  the  traek  treads  eould  remain  still  or  eontinue  in  rotation, 
while  the  pods  rotate  on  the  eore  eonneetion  arm  in  alternating  and  parallel  motions  in  the 
direetion  of  travel.  FAP-W  eould  evolve  as  both  pods  eneounter  obstaeles^.  The  pod 
whose  eore  eonneetion  arm  is  loeated  at  the  forward  end  of  the  eore,  as  defined  by  the 
direetion  of  travel,  begins  to  rotate  first.  This  eould  be  deteeted  by  the  eontaet  sensors  on 
the  pods  or  on  the  eore  faeeplate,  or  by  the  aeeelerometer  data.  The  forward  pod  would 
rotate  forward  as  in  the  FAP-S.  However,  when  the  first  pod  was  rotated  fully  forward, 
the  seeond  pod  rotation  would  begin  also  in  the  forward  direetion.  This  would  tend  to 
elevate  the  eore.  Afterwards,  the  two  pods  eould  eontinue  with  their  rotations  at 
equivalent  rates,  remaining  about  1 80  degrees  out  of  phase,  undulating  the  eore  up  and 
down  over  the  variegated  surfaee.  Turning  on  sueh  a  surfaee  eould  be  aeeomplished  by 
aetivating  the  traeks  in  addition  to  the  pod  rotations,  by  differentially  rotating  the  pods, 
and  by  ehanging  the  eamber  angle  of  the  pods. 

4. 6. 7.  Fixed  Action  Pattern  Y.  Yawing 

FAP-Y  may  permit  the  robot  to  squeeze  through  a  narrow  passageway.  The  trigger  for 
this  maneuver  eould  be  aetivation  only  of  the  forward  outboard  pod  whiskers  while  the 
robot  was  in  the  normal  elosed  position.  That  pattern  of  aetivation  eould  indieate  a  gap 
through  whieh  the  robot  eould  attempt  to  squeeze'** ***.  The  minimum  gap  width  that  the 
present  NGV  eould  negotiate  is  approximately  eight  inehes.  This  pattern  begins  by  the 
NUGV  baeking  up  and  extending  the  pods  outward  as  in  FAP-U,  however,  at  the  point 
where  the  pods  are  horizontal  with  the  eore,  a  eamber  eommand  is  triggered  that  draws 
both  pods  in  (down  with  respeet  to  gravity).  This  maneuver  will  foree  the  pods  to  rest  on 


*  An  example  of  a  variegated  planar  surfaee  over  whieh  it  would  be  appropriate  for  this  NUGV  to  walk 
would  be  an  extended  egg-earton  with  eompartments  in  suffieient  quantity  and  size  to  hold  144  basket 
balls. 

^  The  differenee  between  FAP-W  and  FAP-S  eould  be  in  the  aeeelerometer  indieations  of  eore  and  pod 
position  when  the  obstaeles  are  eneountered.  A  eonsistently  inelined  eore  indieates  the  predominanee  of  the 
obstaele  in  the  forward  direetion,  while  an  oseillating  eore  indieates  a  variegated  surfaee  that  may  be  better 
managed  by  the  FAP-W. 

***  Normally  FAP-Y  would  not  be  attempted  when  alternative  aetion  patterns  were  available. 
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the  outboard  edges  of  their  traek  treads.  Then,  alternately  rotating  further  the  ends  of  the 
pods  while  moving  forward  should  eause  the  vehiele  to  Yaw  baek  and  forth.  If  the  rate  of 
Yaw  is  eorreet,  the  vehiele  should  pass  through  an  orifiee  of  dimension  down  to  the 
minimum.  The  prineiple  here  is  that  the  pods  are  alternately  rotated  while  the  pod  eamber 
angle  direets  the  angle  of  attaek  of  the  treads  to  turn  the  vehiele.  But  a  similar  pattern 
may  be  aeeomplished  by  simple  skid  steering  of  the  vehiele  while  in  the  open  position. 

4. 7.  Summary  of  the  Fixed  Action  Patterns 
The  various  Fixed  Aetion  Patterns  are  summarized  in  Table  1. 


Fixed  Action  Pattern 

Trigger 

Expected  Conditions 

P  Porpoise 

Absence  of  any  contact 

Immersion 

RRun 

Movement  commanded  by 
the  activity  monitor  in  the 
absence  of  whisker  output 

Obstacle  free 

S  Seale 

Obstacle  is  detected  in  the 
forward  direction  of  travel 
by  whiskers.  Core 
accelerometer  indicates 
consistent  ascending  or 
descending  pattern. 

Obstacles 

T  Tumble 

Both  forward  and  reverse 
motion  are  blocked 

Entrapment 

U  Undulate 

Low  battery  voltage; 
obstacle  detection;  loss  of 

RF  input 

Poor  visibility,  poor  RF 
communications,  low  power 
reserves. 

WWalk 

Velocity  <  expected, 
obstacles.  Core 
accelerometer  indicates 
inconsistent  ascending  or 
descending  pattern. 

Variegated  surface 

Mud  and  other  impediments 

Y  Yaw 

Outboard  whisker  activation 

Presence  of  a  traversable 
gap 

Table  1.  Fixed  Action  Patterns 

The  Fixed  Aetion  Patterns  are  low-level  behavioral  repertoires  by  whieh  the  robot 
eoordinates  its  movements.  The  set  of  Fixed  Aetion  Patterns  pretty  mueh  is  inelusive  of 
all  of  the  maneuvers  possible  with  the  six  degrees  of  motion  freedom  that  are  available  to 
the  robot.  A  greater  diversity  of  overt  behavior  eould  be  observed  when  the  internal 
eonditions  evolve  during  behavior  and  trigger  transitions  among  the  patterns.  These 
transitions  eould  oeeur  at  any  time  during  a  behavior,  and  do  not  require  the  eompletion 
of  one  pattern  before  the  initiation  of  another.  The  Fixed  Aetion  Patterns  are  behaviors  to 
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which  the  robot  would  default  under  certain  circumstances  that  required  some  behavior 
but  for  which  requirements  for  no  other  task-specific  actions  were  evident. 

4.8.  Direction  and  Extent  of  Pod  Rotations  Define  the  Patterns 

Five  of  the  seven  Fixed  Action  Patterns  (less  FAP-T  and  FAP-Y)  differ  primarily  on  the 
directions  and  extents  of  the  pod  rotation  with  respect  to  the  core.  These  differences  are 
shown  in  Figure  8. 
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Figure  8,  Pod  Rotation  Differences  between  five  of  the  seven  Fixed  Action  Patterns 
4.9.  The  Behavioral  Constraints 

The  behavioral  constraints  are  simple  reactive  behaviors  that  constrain  other  behaviors  to 
prevent  serious  disturbances  to  homeostasis.  Thus  I  will  call  these  reactive  behaviors 
Basic  Reactive  Patterns  (BRP).  The  robot  will  come  from  the  factory  equipped  with  a 
few  pre-planned^^  BRP  that  respond  to  critical  events  in  ways  that  would  restore  the 
sensors  of  those  events  to  their  states  before  the  events  occurred.  The  sensors  involved 
are  those  that  monitor  the  key  homeostatic  conditions.  The  BRP  occur  when  certain  pre- 
established  sensor  threshold  values  are  breached.  The  sub-networks  that  manage  the 


*’  Pre-planned  in  the  sense  that  the  rules  that  govern  the  definition  of  the  transfer  funetions  between  input 
and  output  are  pre-determined  in  the  design  of  the  controller,  and  yet  are  subject  to  rapid  as  well  as  slow 
adaptations  to  improve  performance  and  compensate  for  hardware  drift. 
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different  BRP  are  loeated  among  the  boxes  labeled  fixed  connections  and  plastic 
connections  in  Figure  5. 

For  our  robot,  these  eritieal  events  should  be:  loss  of  mobility  or  inaetivity,  loss  of  eore 
balanee,  loss  of  traek  eontaet,  eollision  with  the  eore  faee  plates,  and  loss  of  energy.  We 
seleeted  these  eritieal  events  for  their  relevanee  to  the  viability  of  the  robot  under  the 
eonditions  with  whieh  we  expeet  it  to  normally  operate.  Other  eritieal  events  eould  be 
eonsidered,  and  appropriate  sensors  supplied,  sueh  as  for  temperature,  water  infdtration, 
and  tampering,  either  physieally  or  eleotronieally. 

The  Basie  Reaetive  Patterns  do  not  neeessarily  take  the  robot  to  any  speeifie  loeation  or 
in  any  speeifie  direetion,  but  do  eontinually  aet  to  eorreet  or  shape  any  ongoing  random, 
pre-programmed  (sueh  as  the  fixed  aetion  patterns),  and/or  aequired  behaviors  of  the 
robot. 

Previously  I  deseribed  seven  Fixed  Aetion  Patterns  that  ean  be  assumed  by  the  robot  in 
response  to  various  internal  sensor  eonditions.  All  of  these  behaviors  are  stereotypieal  in 
the  sense  that  they  are  eompletely  predietable  given  the  eonstellation  of  sensor  eonditions 
in  the  internal  environment.  The  five  Basie  Reaetive  Patterns  that  I  will  now  deseribe  will 
eonstantly  modulate  the  seven  Fixed  Aetion  Patterns. 

4.9.1.  Basic  Reactive  Pattern  A.  Activity 

The  objeetive  of  BRP -A  is  to  prevent  the  robot  from  either  moving  too  slowly  or  moving 
too  rapidly.  Movement  may  be  assessed  by  the  integration  of  the  aeeelerometers  and 
sensors  monitoring  the  rotations  of  the  traek  drive  wheels.  There  will  be  a  range  of 
activity  that  is  optimal  for  the  performance  of  the  robot  and  director.  No  activity  is,  by 
definition,  undesirable  for  a  mobile  robot.  High  levels  of  activity,  while  potentially  useful 
under  extreme  circumstances,  will  more  quickly  deplete  the  energy  reserves  of  the 
vehicle,  subject  it  to  destructive  collisions,  and  reduce  the  usefulness  of  sensor 
information  that  is  returned  to  the  human  observer  during  monitoring.  Thus  the  extremes 
of  inactivity  and  activity  should  be  avoided.  To  accomplish  this,  the  very  low  or  very 
high  activity  readings  should  contribute  to  increases  or  decreases  in  activity  as 
appropriate  to  maintain  activity  within  the  preferred  range. 

The  Gaussian  curve  in  Figure  9  shows  the  expected  relationship  of  activity  levels  and 
system  performance.  To  optimize  performance,  the  system  attempts  to  keep  activity  in 
the  preferred  mid  range  by  modulating  the  activity  of  the  12  output  elements.  An 
alternative  approach  is  to  use  the  activity  gain  to  modify  the  general  inhibitory  or 
excitatory  influences  within  the  controllers.  Specific  inhibitory  or  excitatory  control 
commands  needed  to  execute  any  particular  behavior  could  be  adjusted  by  these  gains. 
Should  we  take  this  approach  to  modulation,  then  a  slow  stealthy  movement  of  the  robot 
could  be  performed  while  the  accelerometer  input  was  attempting  to  move  the  system  to 
the  right  in  Figure  9.  Under  that  circumstance,  a  sudden  decision  to  execute  an  evasive 
maneuver  would  be  facilitated  by  the  elevated  excitatory  gains  and  depressed  inhibitory 
gains. 
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Figure  9,  Relationship  between  preferred  activity  levels  and  performance. 

In  an  artificial  system,  to  achieve  some  independence  from  the  fickle  motivations  of  its 
operator,  the  robot  must  provide  an  internal  provocation  that  is  linked  to  activity  itself. 
This  internal  provocation  should  contribute  to  an  apparent  spontaneity  that  permits  trial 
and  error  learning  and  the  exercise  of  learned  behavioral  patterns. 

To  move  without  an  explicit  or  external  provocation  the  robot  could  have  in  its  control 
algorithm  a  parameter  that  assesses  the  total  dynamics  of  its  actuators.  The  dynamics  of 
the  system  are  characterized  by  the  accelerometer  (A)  activity.  Let  this  quantity  be  D. 

Then 


Din  =  2  A. 

D  should  persist  over  time  (t)  with  some  factor  (p)  to  damp  out  rapid  fluctuations. 


Dt  =  p*Dt-i  +  Dm 


The  robot  should  operate  usually  in  the  midrange  of  its  capability  as  shown  in  Figure  8. 
Thus  an  optimal  D  should  be  MaxD/2. 

If  D  is  less  than  MaxD/2,  then  motor  activation  commands  should  be  amplified  by  the 
difference.  If  D  is  greater  than  MaxD/2,  motor  inhibition  commands  could  be  amplified 
by  the  difference. 

Dout  =  (1.0-2*D/MaxD) 
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Dout  will  range  from  +1 .0  to  -1 .0. 

Dout  should  affeet  all  output  elements  equally.  Although  in  the  alternative  implementation 
the  influenee  may  be  indireet  via  the  intervening  exeitatory  (E)  and  inhibitory  (I) 
elements  (see  Figure  8).  For  example: 

If  Dout  >  0,  then 

CF  =  Dout*F  Ecl-FIcl 

Else  CF  =  F  Ecl  +  Dout*F  Icl 


As  these  output  elements  are  eoupled  to  meehanieal  proeess  with  inertia,  the  prineipal 
effeet  of  ehanging  D  will  be  to  ehange  the  overall  rate  at  whieh  the  aetivity  proeeeds^^. 

Various  non-linear  funetions  may  be  applied  to  smooth  and  limit  this  proeess.  (see 
Blaekburn,  1987). 

4.9.2.  Basic  Reactive  Pattern  B.  Balance 

The  objeetive  of  the  BRP-B  is  to  prevent  falls  and  eonsequent  damage  to  the  vehiele.  As 
the  orientation  and  motion  of  the  eore  will  be  the  primary  determinants  of  balanee, 
balanee  may  be  assessed  by  the  eore  aeeelerometers.  The  eore  will  be  taken  along  in 
many  different  ways,  however,  with  the  motions  of  the  two  traek  pods,  but  when  these 
motions  are  expeeted,  or  predieted  by  the  motor  eommands,  they  are  at  least  purposeful, 
if  not  as  dangerous  as  those  oeeurring  by  aeeident.  Balanee  or  losing  balanee  thus  should 
depend  upon  whether  the  event  was  expeeted  or  not.  To  establish  an  expeetation,  the 
automatie  eontrol  algorithms  must  make  some  predietions  about  how  the  eore 
aeeelerometer  data  are  going  to  ehange  with  a  partieular  maneuver.  If  those  predietions 
oeeur,  then  balanee  is  maintained,  however  if  events  eontradiet  those  predieted  ehanges, 
then  balanee  would  be  upset.  This  predietion  should  be  on-going  and  depend  upon  the 
integration  of  data  from  three  veetors:  1)  the  pattern  of  aetivations  of  the  different  motors, 
2)  the  pod  leverage  points,  and  the  eurrent  eonformation  of  the  vehiele. 

Balanee  would  be  ealibrated  eontinuously.  This  is  most  easily  seen  when  the  robot  is 
planning  to  remain  stationary  on  a  stationary  surfaee.  Under  that  eondition,  its  planning 
may  involve  nothing  more  than  the  absenee  of  a  deeision  to  move.  At  this  time  the  robot 
would  be  predieting  no  ehanges  in  its  eore  aeeelerometers.  Therefore,  any  ehange  in  the 
aeeelerometers  indieates  an  unexpeeted  ehange  in  balanee,  and  should  be  met  with  a 
reaetive  and  eorreetive  response  from  its  motors.  Should  part  of  the  surfaee  on  whieh  the 
robot  is  resting  give  way  suddenly,  the  robot’s  aeeelerometer  data  would  ehange  as  its 
eore  moves  under  the  influenee  of  gravity.  This  aeeeleration  ean  be  eountered  by 
aetivation  of  the  traek  pod  axes  that  would  normally  produee  an  aeeeleration  in  the 
direetion  opposite  to  the  that  of  the  fall,  given  its  eurrent  eonfiguration.  The  ealeulation 
needed  is  essentially  an  inverse  of  that  used  to  prediet  a  eore  aeeeleration  in  the  partieular 


The  control  processes  should  be  less  rate  sensitive,  and  more  position  sensitive,  so  that  an  action  will 
continue  until  completion  before  the  next  action  is  initiated. 
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direction  of  the  error.  For  this  calculation,  we  can  use  artificial  neural  networks  that  have 
a  design  rich  in  feedback.  We  will  shortly  describe  a  candidate  neural  network  for  this 
purpose. 

While  the  expected  pod  accelerations  for  any  activation  of  its  motors  is  easily  determined 
given  fixed  positions  of  the  core,  this  will  not  ordinarily  be  the  case  with  our  NUGV.  Nor 
will  it  ordinarily  be  our  concern,  as  we  need  to  predict  the  core  accelerometer  activity 
with  activations  of  the  pod  motors.  The  core  will  be  subject  to  disturbances  caused  by 
activation  of  any  of  the  motors,  and  in  all  possible  combinations  while  the  pods 
themselves  are  at  a  great  number  of  different  positions  with  respect  to  the  core  and  to 
their  leverage  points.  Predicting  the  core  accelerations  is  a  complex  multivariate  problem. 

The  expected  accelerations  of  the  core  will  be  functions  of  the  motor  commands  of  the 
vehicle  (M),  the  present  conformation  of  the  vehicle  (C),  and  the  leverage  points  (L). 


Ae=/M,C,L) 


I  have  already  defined  the  elements  of  the  motor  vector  (M)  and  of  the  conformation 
vector  (C). 

The  leverage  point  vector  (L)  is  simply  a  feature  set  from  the  collection  of  data  points 
from  the  sensors  that  detect  and  locate  pod  contact.  The  sensors  that  participate  in  this 
collection  are  the  pod  whiskers  and  the  pod  plate  pressure  sensors.  For  illustration 
purposes,  let  us  assume  that  the  contact  profde  for  each  pod  was  assessed  by  only  four 
discrete  sensors,  each  sensor  either  being  on  or  off.  One  sensor  would  be  located  at  each 
end  of  the  pod,  and  one  located  on  each  pod  plate.  Then  the  pod  contact  could  be 
determined  to  the  resolution  of  those  four  locations  by  one  of  sixteen  different  features  as 
shown  in  Figure  10.  All  conditions  for  each  of  the  integrating  elements  a-p  must  be 
present  before  an  output  can  occur.  In  the  Figure,  an  input  line  terminating  in  arrow 
indicates  the  requirement  for  an  active  input,  while  the  input  line  terminating  in  a  dot 
indicates  the  requirement  for  an  inactive  input. 

At  this  point,  I  should  note  that  the  sensor  vectors  undergo  significant  organization  in 
most  training  algorithms  for  multi-layer  perceptrons.  An  interim  result  of  this 
organization  is  a  vector  of  feature  detectors  similar  to  what  I  have  shown  in  Figure  10. 
The  network  designer  can  greatly  simplify  the  process  of  self-organization  in  a  multi¬ 
layer  perceptron  by  prescribing  the  connectivity  that  defines  inclusively  all  of  the 
potentially  relevant  features  that  are  available  from  the  sensor  vector,  even  if  some  of 
those  features  are  never  used  by  the  network  in  calculating  the  required  (trained)  output 
vectors. 
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Figure  10,  Firing  conditions  of  sixteen  hypothetical  feature  detectors. 

The  aetual  ealeulation  of  Ae  ean  be  performed  rather  quiekly  as  long  as  the  influenees  of 
the  different  motor  eommands  given  the  different  eonformations  and  leverage  points  are 
known.  We  ean  diseover  these  influenees  by  observing  what  happens  to  the  eore 
aeeeleration  under  a  variety  of  these  eonditions,  and  eonstruet  the  gains  of  the  funetion 
that  ealeulates  Ae.  This  proeess  is  graphieally  demonstrated  in  Figure  11.  The  dotted 
eireles  in  the  Figure  represent  the  eonditioning  signal.  Faeilitation  is  represented  by  line 
terminating  in  an  arrowhead.  Inhibition  is  represented  by  a  line  terminating  in  a  dot. 

Errors  (E)  in  balanee  are  deteeted  by  unexpeeted  ehanges  in  the  eore  aeeelerometers  (A). 
The  unexpeeted  measure  is  a  funetion  of  the  differenee  between  the  expeeted  (e)  and  the 
aetual  (a)  reading. 


E=/Ae,Aa) 

In  the  proeess  of  defining  the  funetion  that  prediets  eore  aeeelerations,  the  observed  eore 
aeeeleration  with  a  partieular  eontrolled  aetivation  of  the  pod  motors  is  eompared  with 
the  eurrent  output  of  the  integrator  that  develops  the  expeeted  eore  aeeeleration.  Initially 
there  will  be  eompletely  nonsensieal  predietion,  and  the  error  veetor  (E)  will  either  look 
pretty  mueh  like  Aa  or  like  parts  of  it.  This  error  is  passed  baek  over  to  the  expeetation 
integrator  to  modify  the  gains  that  determine  the  infiuenee  of  its  inputs.  I  give  the 
modifieation  rule  in  the  seetion  on  Activity  Dependent  Facilitation  later  in  this  paper. 
Under  eontrolled  eonditions,  the  network  learns  to  prediet  what  aetually  happens  to  the 
eore  aeeelerometers. 
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Figure  11.  Method  for  the  acquisition  and  execution  of  balance. 

We  are  not  finished  with  the  balanee  ealeulations  however,  for  the  objeetive  of  the 
balanee  response  is  to  restore  the  undisturbed  position  or  orientation  of  the  agent  under 
uneontrolled  eonditions. 

What  the  robot  does  to  eorreet  errors  in  balanee  should  depend  also  upon  its  eurrent  state 
at  the  time  the  error  oeeurred.  Another  way  to  look  at  the  relationships  in  Figure  11  is  to 
replaee  the  data  in  the  expeetation  veetor  with  the  data  in  the  error  veetor  and  then  ask  for 
the  motor  eommands  that  would  eontribute  to  that  speeifie  output  veetor.  In  other  words, 
the  deteeted  error  in  aeeeleration  eould  otherwise  be  a  predieted  eore  aeeeleration  given  a 
speeifie  eonformation  of  the  robot,  its  eurrent  leverage  points,  and  a  pattern  of  motor 
eommands.  We  need  to  solve  a  set  of  simultaneous  equations  to  eome  up  with  the 
missing  motor  veetor  that  would  eorreet  the  balanee  error.  We  already  have  the  neeessary 
information  to  ealeulate  the  required  motor  eorreetion  to  any  error  as  we  have  ealeulated 
previously  C,  L,  and  E.  But  as  we  do  not  wish  to  aggravate  the  error  by  using  the  exaet 
motor  eommands  that  would  otherwise  generate  it,  we  simply  need  only  invert  the  motor 
eommands  to  aetivate  an  opposing  response.  This  is  easily  done  as  our  motor  veetor  is 
eomposed  of  matehed  pairs  of  elements.  We  invert  the  veetor  by  erossing  the  inputs  of 
these  matehed  pairs. 

The  faetors  involved  in  the  ealeulation  of  the  motor  eorreetion  veetor  are  also  shown  in 
Figure  1 1 .  Initially  the  error  will  be  quite  large  and  approximates  the  eore  aeeelerometer 
aetivity.  Initially  also  the  output  of  the  motor  eorreetion  integrator  will  be  quite  small  as 
its  input  gains  are  undeveloped.  Very  quiekly  the  system  learns  to  prediet  eore 


26 


3/22/2003 


Learning  Mobility 


accelerations  and  to  predict  the  motor  commands  that  generate  them.  The  system  will 
thus  learn  to  predict  itself. 

As  the  motor  corrections  are  inverted,  and  the  conformation  and  leverage  inputs  are 
continuous,  this  process  in  the  absence  of  a  core  acceleration  error,  if  left  as  described, 
could  significantly  interfere  with  on-going  motor  commands.  The  C  and  L  vectors  are 
necessary  but  insufficient  conditions  for  a  motor  correction.  There  are  various  ways  to 
inhibit  the  output  of  the  motor  correction  vector  until  an  error  is  present.  The  objective  in 
all  cases  though  would  be  to  prevent  the  motor  correction  until  the  output  of  the  error 
integrator  represents  primarily  errors  of  prediction,  and  the  output  of  the  motor  correction 
integrator  represents  primarily  error  specific  responses. 

In  summary,  the  changes  in  motor  commands  necessary  to  correct  for  any  disturbance  in 
balance  are  a  function  of  the  delta  in  the  balance  vector  that  describes  the  nature  of  the 
disturbance  and  the  current  state  of  the  robot  that  will  determine  how  it  can  best  respond 
to  the  disturbance.  Both  the  prediction  of  the  core  acceleration  with  any  given  conditions, 
and  the  motor  commands  that  would  generate  core  accelerations  with  any  given 
conditions  can  be  acquired  from  experience  under  controlled  circumstances.  Balance 
errors  would  be  corrected  then  by  sending  inverted  motor  commands  to  the  twelve  output 
elements.  In  early  development  of  our  control  algorithms  we  can  use  the  observed  core 
accelerations  under  stable  conditions  to  define  (condition  by  experience)  the  transfer 
functions  between  the  C,  L,  and  M  vectors  and  the  Ae  vector.  As  the  network  learns  to 
accurately  predict  the  core  accelerations,  the  error  between  predicted  and  observed 
decreases  and  the  output  of  the  error  integrator  drops  to  zero,  until  an  unexpected  event 
occurs.  Simultaneous  with  the  self-organizing  process  of  core  accelerometer  prediction, 
the  motor  corrections  required  to  restore  balance  given  any  particular  balance  error  (E) 
are  conditioned  by  the  current  motor  commands.  As  each  motor  is  always  subject  to 
opposing  commands,  the  inputs  to  the  opposing  motor  integrators  are  conditioned  by  the 
current  motor  activity.  Motor  correction  commands  would  normally  appear  after 
conditioning  only  when  a  balance  error  occurs.  The  inhibition  of  the  balance  error 
integrator  by  the  expected  core  acceleration  is  very  important  to  allow  the  execution  of 
proper  motor  commands.  I  describe  conditioning  in  greater  detail  later  in  the  section  on 
Activity  Dependent  Facilitation. 

Quite  often  we  should  expect  that  the  robot  will  lose  balance  when  the  unexpected  event 
is  caused  by  a  state  change  in  the  external  environment.  Any  attempt  by  the  robot  to 
restore  its  balance  subsequent  to  this  external  environmental  state  change  should  fail. 
Fortunately,  the  robot  has  additional  Basic  Reactive  Patterns  that  would  be  invoked  in 
this  circumstance,  such  as  the  BRP-D,  and  that  could  cooperate  with  the  BRP-B  to  stay 
the  unexpected  accelerations. 

The  Japanese  humanoid  robot  projects  have  developed  algorithms  to  maintain  balance  in 
a  multi  degree  of  freedom  robot.  These  methods  should  be  studied  for  application  here. 

4.9.3.  Basic  Reactive  Pattern  C.  Core  Collision  Avoidance 
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BRP-C  prevents  eollisions  of  the  eore  faeeplates  with  objeets  in  the  external 
environment.  The  eore  faeeplates  eontain  video  eameras,  IR  proximity  sensors,  whiskers, 
and  SONAR.  These  sensors  are  open  to  the  environment  and,  exeepting  the  whiskers, 
must  be  proteeted.  Aetivity  from  any  of  those  four  sensor  types  eould  be  used  to  trigger 
the  BRP-C,  but  initially  the  reaetion  will  oeeur  only  in  response  to  aetivations  of  the  eore 
faeeplate  pressure  sensors.  That  is,  the  basie  reaetion  will  be  a  response  to  aetual 
eollisions,  rather  than  a  eollision  avoidanee  meehanism.  We  will  introduee  a  method  later 
in  this  paper  that  will  progressively  assoeiate  aetivity  of  the  IR  proximity  sensors,  of  the 
SONAR,  and  of  the  digital  video  motion  veetors  with  the  more  proximate  deteetors  down 
to  the  whiskers  or  toueh  sensors.  Only  after  the  system  has  learned  to  assoeiate  events 
deteeted  by  the  distanee  sensors  with  events  oeeurring  in  response  to  aetivity  in  the 
sensors  for  the  internal  environment,  true  eollision  avoidanee  will  be  possible.  The  BRP- 
C  baeks  the  vehiele  away  from  an  aetual  (and  later  -  impending)  eollision.  The  baeking 
reaetion  ean  be  a  transient  inversion  of  the  preeeding  motor  eommands.  To  prevent  the 
robot  from  getting  stuek  in  an  infinite  loop,  a  random  noise  ean  be  imposed  on  the 
subsequent  forward  eommand. 

The  BRP-C  should  help  to  reduee  entanglements  as  the  vehiele  will  avoid  moving  into 
objeets  that  may  get  within  the  eore-pod  domain. 

The  Basie  Reaetive  Pattern  C  is  probably  the  simplest  BRP  to  explain  and  to  implement. 
Only  the  eore  faee-plate  sensors  will  trigger  this  BRP,  and  the  response  pattern  will 
usually  depend  upon  the  on-going  motor  eommands^^.  For  example: 

If  Wf,  then  reverse  eurrent  motor  eommands. 

Where  Wf  is  the  eore  faeeplate  whisker(s),  pressure,  or  toueh  sensors  in  the  forward 
direetion  of  travel.  In  lieu  of  whiskers,  any  strain  sensitive  deviee  attaehed  to  the 
faeeplate/eore  juneture  eould  serve  as  the  deteetor  for  eollision. 

The  faeeplate  proteetion  response  offers  an  opportunity  for  the  direetor  to  easily  inhibit 
any  ongoing  aetivity  of  the  robot.  Lightly  tapping  on  the  faeeplate  should  reverse  the 
aetivity  of  the  robot'"^. 

4.9.4.  Basic  Reactive  Pattern  D.  Track  Contact 

The  objeetive  of  the  BRP-D  is  to  optimize  traek  eontaet  with  leveragable  surfaees.  From 
our  earlier  diseussion  of  the  Fixed  Aetion  Patterns,  we  should  eonelude  that  the  robot 
prefers  a  eonformation  in  relation  to  its  leverageable  surfaee  in  whieh  the  plate  pressure 


13 

The  response  is  eomplieated  a  little  bit  when  distanee  sensors  prediet  a  eollision  and  where  the  robot 
may  not  be  in  motion,  or  may  be  moving  too  slowly  to  eseape  the  eollision.  In  this  ease,  the  distanee 
sensors  must  be  mapped  to  the  motor  eommands  that  are  assoeiated  with  reeeding  objeets  in  that  sensor 
field.  This  mapping  ean  self-organize  with  the  robot’s  experienee.  We  will  take  up  the  possible  meehanism 
later  in  this  paper. 

During  training  (to  be  diseussed  in  the  seetion  on  Learning  and  Adaptation),  tapping  on  the  robot’s  eore 
faeeplate  eould  punish  a  behavior  as  effeetively  as  a  eollision. 
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sensors  are  maximally  aetive.  The  BRP  would  press  the  traek  upon  a  surfaee  that  is 
pereeived  by  the  eontaet  sensors  (pressure  plates  and  whiskers)  to  lie  within  reaeh.  The 
infrared  proximity  sensors  and  SONAR  may  be  eonditioned  to  partieipate  in  this  BRP 
based  upon  the  response  patterns  eontrolled  by  the  pod  pressure  plate  and  whisker 
sensors.  Preferred  surfaees  would  lie  either  below  the  pod  traek  (in  the  direetion  of 
gravity)  or  in  front  of  the  traek  (in  the  direetion  of  travel).  Pod  rotation  and  eamber,  and 
traek  tread  rotation  may  be  employed  to  aehieve  traek  eontaet.  If  no  eontaet  is  made,  the 
BRP-D  should  eause  the  pod  to  randomly  explore  its  immediate  environment  in  seareh  of 
a  eontaet  point.  If  the  whiskers  indieate  eontaet,  then  the  BRP-D  should  move  the  pods  in 
sueh  a  way  that  the  whisker  eontaet  is  replaeed  by  a  eontaet  with  the  eenter  of  the  traeks. 
When  traek  eontaet  is  made,  the  BRP-D  should  attempt  to  enlarge  it. 

Thus,  the  BRP-D  has  three  eomponents.  The  first  eomponent  addresses  the  means  to 
make  traek  eontaet,  any  eontaet.  The  seeond  eomponent  addresses  the  means  to  shift  the 
eontaet  point  from  the  loeation  of  the  whiskers  to  the  loeation  of  the  eenter  of  the  traek. 
And  the  third  eomponent  addresses  the  need  to  inerease  traek  eontaet. 

Exereise  of  the  first  eomponent  should  eause  the  robot  to  find  and  press  upon  the  arms  of 
a  person  who  was  unlueky  enough  to  be  suspending  the  robot  by  its  eore'^. 

Exereise  of  the  seeond  eomponent  should  faeilitate  the  elimbing  of  the  robot  upon  any 
obstaele  that  it  eneountered  or  that  was  plaeed  in  eontaet  with  its  whiskers'^. 

Exereise  of  the  third  eomponent  should  prevent  the  robot  from  rolling  into  an  abyss,  and 
should  eomplete  the  effeets  of  the  first  two  eomponents. 

Similar  to  the  determination  of  an  appropriate  motor  eommand  veetor  to  restore  balanee, 
the  BRP-D  will  use  the  leverage  veetor  (L)  to  assess  pod  eontaet  and  the  eonformation 
veetor  (C)  to  assess  the  robot’s  eonformation.  However,  it  will  assess  L  for  eaeh  pod 
separately.  Thus  there  will  be  a  LL  and  a  LR  for  leverage  of  the  left  pod  and  leverage  of 
the  right  pod  respeetively.  Reeall  that  the  L  veetor  will  be  eomposed  of  features  that 
individually  describe  the  conditions  that  will  determine  which  of  the  three  components  of 
the  BRP-D  should  execute.  It  will  further  assess  the  activity  of  the  individual  elements  in 
the  LL  and  LR  vectors.  The  three  running  questions  for  the  BRP-D  controller  will  be  the 
following: 

1 .  What  modifications  to  the  present  motor  commands  will  be  necessary  to 
randomly  search  the  physical  space  for  contacts? 

2.  What  modifications  to  the  present  motor  commands  will  be  necessary  to  replace 
whisker  contact  points  with  track  contact  points? 

3.  What  modifications  to  the  present  motor  commands  will  be  necessary  to  increase 
the  magnitudes  of  the  pod  pressure  elements  of  both  LL  and  LR? 


The  grasping  reflex  of  primates  is  a  biologieal  example  of  this  BRP-D  component. 
A  similar  response  will  be  observed  when  one  strokes  the  breast  of  a  perching  bird. 
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The  answers  to  these  questions  will  involve  first  the  assessment  of  the  eurrent  L  veetor. 
We  ean  do  this  by  way  of  example  using  the  data  from  Figure  10.  Let  us  assume  that  the 
four  pressure  sensors  eome  from  a  pod  that  is  eonneeted  to  the  forward  end  of  the  eore, 
and  is  situated  in  the  elosed  position  on  the  left  side  of  the  eore.  Let  us  also  assume  that 
the  input  pattern  from  left  to  right  on  eaeh  element  of  Figure  10  represents  the  four  pod 
pressure  sensors  distributed  eloekwise  on  the  pod  starting  from  the  top,  then  fore,  bottom, 
and  aft  positions  from  the  perspeetive  of  the  eore.  Then  the  element  c  in  Figure  10 
represents  the  pod  making  eontaet  only  with  the  surfaee  upon  whieh  the  pod  is  resting. 
Element  o  represents  a  eomplete  suspension  of  the  pod  with  no  eontaet  points  exeept  for 
its  eonneeting  axle  with  the  eore.  Element /indieates  that  the  pod  is  wedged  between 
eontaets  points  on  its  top  and  bottom  surfaees. 

BRP-D  Rule  #1 ;  If  the  aetive  feature  in  the  L  veetor  for  that  pod  is  feature  o,  then  the 
BRP-D  would  initiate  a  seareh  of  loeal  physieal  spaee  by  aetivating  the  pod  rotation  and 
eamber  motors.  The  sequenee  and  durations  of  the  aetivation  would  be  randomized'^  and 
eould  persist  until  a  eontaet  was  made  (a  feature  other  than  o  oeeurred  in  L),  or  until  a 
time-out  was  reaehed,  or  until  balanee  was  signifieantly  disturbed. 

BRP-D  Rule  #2:  If  a  eontaet  was  made  that  resulted  in  the  features  b  or  d,  the  BRP-D 
eould  aetivate  a  pod  rotation  eommand  and  suspend  any  eamber  motion  until  features  a 
or  c  appeared. 

BRP-D  Rule  #3:  If  a  eontaet  was  made  that  resulted  in  the  features  a  or  c,  the  BRP-D 
eould  eontinue  the  pod  motor  aetivation  as  long  as  the  strength  of  the  input  was 
inereasing.  If  the  strength  of  the  input  began  to  deerease  the  pattern  of  pod  motor 
aetivation  eould  be  reversed,  and  the  test  repeated.  Both  eamber  and  rotation  degrees  of 
freedom  would  have  to  be  tested  separately.  During  running  (EAP-R)  the  traek  rotation 
motors  should  also  be  subjeet  to  this  rule. 

The  above  rules  will  aeeount  for  most  of  the  eonditions  oeeurring  during  performanee  of 
the  seven  Eixed  Aetion  Patterns.  Oeeasionally,  the  L  veetor  will  eontain  the  features  e,  g, 
h,  and  j.  These  will  represent  eneounters  with  obstaeles  while  moving.  Obstaeles  ean  be 
handled  in  two  basie  ways:  1)  seale  them,  and  2)  go  around  them. 

The  EAP-S  and  EAP-W  will  tend  to  favor  the  first  option. 

There  remain  several  other  eonditions  in  whieh  features  i,  k  /,  m,  n,  and p  eould  oeeur. 
These  eonditions  would  involve,  in  general,  an  entrapment  of  the  pods.  To  eseape  from 
sueh  eonditions,  the  robot  may  best  run  through  all  three  rules  randomly  and  vigorously, 
until  freedom  is  aehieved.  To  ineorporate  this  strategy  into  our  rules,  we  need  only  add 


Adding  random  noise  to  a  control  process  in  the  absence  of  sensor  input  may  not  be  necessary  for,  in 
general,  sensor  noise  exists  in  all  sensor  systems,  and  is  ordinarily  suppressed  by  the  presence  of  valid 
sensor  input.  Local  adaptation  mechanisms  that  adjust  the  sensitivity  of  the  sensors  are  the  usual  means  of 
this  suppression.  As  the  valid  input  decreases,  the  sensitivity  goes  up  and  the  probability  that  noise  reaches 
a  stimulus  threshold  becomes  one. 
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the  eonditions  to  the  initiation  list  of  Rule  #1.  If  the  motions  of  the  vehiele  generated 
suffieient  wiggle  room,  then  the  exeeution  of  Rules  #2  and  #3  would  be  possible. 

The  aetivities  of  eaeh  pod  are  governed  by  these  rules  based  on  the  loeal  pod  eontaet 
sensor  information.  However,  some  effeets  of  any  aetivity  of  any  portion  of  the  robot  will 
be  transmitted  to  other  portions  of  the  robot,  affeeting  their  dynamies,  their  leverage,  and 
their  sensor  feedbaek.  In  a  sense  the  two  pods  will  eompete  for  optimal  eontaet  with 
leverage  points,  transmitting  their  intentions  through  their  physieal  eonneetions  with  the 
eore.  The  BRP-B  will  serve  to  mediate  any  eonfliets. 

The  robot  will  usually  have  some  physieal  referenee  (eontaet  or  leverage  point)  during 
translation.  Therefore,  if  it  eneounters  an  abyss  the  BRP-D  should  prevent  the  robot  from 
dropping  into  it.  Instead  of  moving  into  an  abyss,  the  BRP-D  should  reorient  the  robot  to 
eontinue  along  the  surfaee  on  whieh  it  has  established  leverage  for  its  motion.  This 
should  happen  beeause  the  BRP-D  attempts  to  inerease  traek  eontaet.  Movement  that 
deereases  traek  eontaet  would  be  quiekly  interrupted. 

Running  along  a  beam  or  braneh  is  a  simple  modifieation  of  FAP-R  by  the  Basie 
Reaetive  Pattern  D  for  Traek  Contaet. 

4.9.5.  Basic  Reactive  Pattern  E.  Energy  Level  and  Use 

The  objeetive  of  BRP-E  is  to  aequire  and  eonserve  energy.  The  sensors  for  BRP-E 
measure  energy  reserve,  and  energy  eonsumption  or  utilization.  The  homeostatie 
toleranee  for  energy  level  is  quite  broad,  and  deseribes  a  Sigmoid  similar  to  that  for 
Inhibition  in  Eigure  9.  Energy  aequisition  behaviors  need  be  triggered  only  when  energy 
reserves  are  quite  low.  In  general,  the  deteetion  of  low  battery  eharge  should  interrupt 
most  on-going  behavior,  and  trigger  a  reeharge-speeifie  behavior  .  In  the  natural 
environment,  with  a  limited  or  non-existent  repertoire  of  navigation  behaviors,  the 
energy-limited  robot  may  best  stop  all  random  motor  aetivity  and  broadeast  a  eall  for 
help. 

4.10.  Motivation 

The  sensor  inputs  that  govern  eaeh  of  the  five  basie  reaetive  patterns  above  are  analogous 
to  biologieal  motivators.  So,  for  laek  of  a  good  engineering  term,  I  eall  them  motivators. 
Onee  again  the  five  basie  motivators  are  aetivity,  balanee,  eore  eollision  avoidanee,  traek 
eontaet,  and  energy  level. 

Table  2  reviews  the  relationship  between  the  short  list  of  behavioral  eonstraints,  whieh 
serve  as  the  intrinsie  motivators  that  drive  and  determine  the  most  appropriate  robot 
behavior,  and  the  sensors  that  monitor  the  robot’s  internal  state  spaee  (interoeeptors). 


**  In  many  interior  robotic  systems,  an  example  of  a  reeharge-speeifie  behavior  is  for  the  robot  to  home  on 
its  charger  and  plug  itself  in.  This  would  be  a  little  more  difficult  to  accomplish  for  an  exterior  robot 
operating  in  a  complex  natural  environment. 
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The  Aetivity  motivator  is  biphasie  as  it  may  either  inerease  or  deerease  aetivity;  the 
Balanee  motivator  is  monotonie  and  quiekly  affeets  aetivity  to  restore  balanee;  the  Core 
Collision  Avoidanee  motivator  is  also  monotonie  and  quiekly  affeets  aetivity  to  withdraw 
from  eollisions;  the  Traek  Contaet  motivator  is  biphasie  as  either  the  absenee  of  eontaet 
or  the  extremes  of  eontaet  generate  a  quiek  seareh  for  a  preferred  eontaet  profile,  while 
the  oeeurrenee  of  a  preferred  eontaet  profile  generates  a  slower  attempt  to  optimize  it;  the 
Energy  motivator  is  eurrently  monotonie  as  low  levels  of  energy  reserve  trigger  only 
energy  eonserving  aetivities.  We  may  be  able  to  show  later  that  as  the  energy  reserve 
moves  eloser  to  that  trigger  point,  other  energy  aequisition  behaviors  might  be  invoked. 


Motivator 

Influence 

Supporting 

Interoceptors 

Utility  to  Robot 

Aetivity  (A) 

biphasie 

Core  and  Pod 
aeeelerometers  and 

Pod  traek  rotation 

sensors 

Assesses  the  result 
of  movement 
eommands.  Keeps 
the  robot  working. 

Balanee  (B) 

monotonie 

Core  aeeelerometers 

Maintains 
orientation  with 
respeet  to  gravity. 
Assesses  the  result 
of  movement 
eommands. 

Core  Collision 
Avoidanee  (C) 

monotonie 

Core  faeeplate  eontaet 
sensors 

Proteets  distanee 

sensors. 

Reduees  frietion 

Prevents 

entrapments. 

Traek  Contaet  (D) 

biphasie 

Pod  whiskers. 

Pod  plate  pressure 
sensors. 

Eoealizes  leverage 
points. 

Energy  Eevel  and 
Use  (E) 

monotonie 

Battery  eharge  and 
eurrent 

Maintains 
adequate  energy 

Table  2.  Basic  Motivators 

4,10.1,  Homeostasis  Is  Represented  by  Certain  Values  of  the  Motivators 

When  the  motivators  are  in  the  ranges  proper  for  optimal  performanee,  homeostasis  is 
aehieved.  We  may  define  a  sealar  variable  for  homeostasis  (//).  Then,  the  magnitude  of  H 
may  represent  the  totality  of  the  motivator  states. 


H=/(A,  B,C,D,  E) 
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We  may  arbitrarily  speeify  at  what  magnitude  of  H  is  desired,  then,  in  psyehologieal 
terms,  aehieving  that  magnitude  would  indieate  a  eomfort  zone,  and  diverging  from  that 
magnitude  would  indieate  disphoria.  We  will  use  this  measure  of  homeostasis  later  in  our 
diseussion  of  learning  and  adaptation. 

4.10.2.  Motivation  is  always  referenced  to  a  particular  process 

A  sensor  value  that  is  out  of  the  “eomfort”  range  for  the  partieular  proeess  monitored  by 
the  sensor,  will  eontribute  to  an  inerease  in  motivation  to  restore  that  partieular  proeess. 
Prior  to  the  aequisition  of  signifieant  experienees,  we  ean  referenee  any  ehange  in  the 
behavior  of  the  agent  baek  to  a  partieular  interoeeptor  value  that  is  out  of  its  eomfort 
range.  The  “out  of  eomfort”  sensor  may  also  eontribute  to  the  more  general  disphoria 
measure  H,  depending  upon  the  overall  level  of  H  aehieved. 

4.10.3.  Deprivation  and  Excess  Determine  the  Strength  of  a  Motivation 

The  magnitude  of  the  deviation  determines  the  degree  and  direetion  of  the  motivation. 
Following  are  some  examples. 

If  traek  eontaet  was  absent,  the  drive  for  traek  eontaet  would  rise  and  eontribute  to 
inereasingly  greater  motions  to  establish  a  eontaet  and  restore  leverage  (and 
simultaneously  a  stronger  inhibition  of  the  traek  rotation  until  adequate  leverage 
was  re-established). 

If  the  robot  was  restrained,  the  drive  for  movement  mediated  by  BRP-A  would 
inerease  and  the  motor  responses  with  all  other  BRP  would  be  amplified. 

If  the  surfaee  on  whieh  the  robot  was  resting  pitehed  to  and  fro  like  the  deek  of  a 
ship  in  rough  seas,  balanee  would  be  repeatedly  ehallenged  and  the  motivation  to 
stabilize  itself  would  inerease.  The  response  to  repeated  loss  of  balanee  should 
inelude,  in  addition  to  an  attempt  to  inerease  traek  eontaet  with  leverageable 
surfaees,  an  attempt  to  elose  upon  any  available  leverageable  surfaee.  This  should 
oeeur  as  a  eonsequenee  of  the  eooperation  between  BRP-B  and  BRP-D. 

If  the  eore  faeeplate  sensors  were  repeatedly  aetivated,  the  ongoing  motor 
behavior  eould  be  more  vigorously  reversed.  This  should  oeeur  as  a  eonsequenee 
of  the  eooperation  between  BRP-C  and  BRP-A. 

4.10.4.  Governing  Sensors  Cooperate  to  Invoke  Appropriate  Behaviors 

The  examples  above  also  illustrate  how  motivators  eooperate  to  invoke  and  eontrol  the 
most  appropriate  behaviors.  Here  are  a  few  more  examples  speeifie  to  this  point. 

If  the  traeks  reported  eontaet  and  were  spinning  with  no  apparent  aeeelerometer 
indieations  of  forward  motion  of  the  robot  eore,  the  aetivity  motivation  would 
inerease.  This  would  trigger  an  aetion  pattern  transition  from  FAP-R  to  FAP-W. 
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If  the  robot  was  tumbling  down  an  embankment,  the  drive  for  aetivity  would 
deerease,  yet  balanee  would  be  disturbed  and  the  robot  would  likely  invert  its 
motor  eommands  in  an  attempt  to  eounter  the  motion. 

Depending  upon  the  axis  on  whieh  the  robot  was  tumbling,  and  the  eoneomitant 
sensor  aetivity,  the  robot  eould  transition  to  different  eonformations.  If  the 
rotation  of  tumble  was  on  the  X-axis  (see  Figure  6),  the  robot  eould  initiate  a 
FAP-S  due  to  its  obstaele  sealing  response.  This  would  open  the  pods  from  the 
eore  and  oppose  the  tumble. 

If  the  robot  was  tumbling  on  the  Z-axis,  the  robot  eould  initiate  a  turn  response  to 
avoid  obstaeles  deteeted  in  its  pod  proximity  sensors.  This  response  should  also 
oppose  the  energy  of  the  tumble. 

4.10.5.  Self  Awareness 

The  robot  will  be  self  aware  to  the  degree  that  it  ean  optimize  its  homeostasis. 

Awareness,  like  pereeption,  requires  not  only  sensor  proeessing  but  also  an  effeetive 
motor  response.  On  the  sensor  side,  the  sum  of  the  information  from  the  interoeeptors  of 
Table  2  eonstitute  the  input  for  self-awareness  of  the  robot.  We  have  shown  how  the 
robot  must  assess  and  integrate  information  from  all  aeeelerometers  to  make  a 
determination  of  its  eurrent  eonformation  and  of  how  its  eonformation  is  ehanging.  In 
addition,  the  robot  uses  the  aeeelerometer  data  plus  the  traek  veloeity  sensors  to  assess  its 
motion  with  respeet  to  its  leverage  points.  The  robot  will  use  the  H  measure  above  as  a 
general  index  of  eomfort  or  of  its  inverse  -  disphoria,  while  the  speeifie  sensors 
monitoring  the  eritieal  state  variables  will  provide  information  on  what  must  be  addressed 
at  any  moment  in  time.  On  the  effeetor  side,  the  robot  will  use  its  six  degrees  of  motion 
freedom  to  avoid  disruptions  to  homeostasis  and  to  restore  the  eritieal  state  variables. 

4.11.  Beyond  the  Fixed  Action  Patterns 

We  should  ask  at  this  point  in  our  discussion  of  just  of  what  is  the  robot  capable?  Given 
only  the  five  Basic  Reactive  Patterns  and  the  seven  Fixed  Action  Patterns  we  expect  that 
the  robot  could  self-initiate  activity  as  its  motivation  for  activity  would  initially  be  quite 
strong.  We  should  also  expect  from  Table  1  that  the  first  FAP  to  be  assumed  would  be  the 
Run.  Other  FAPs  may  follow  as  conditions  warrant.  But  Run  to  where?  An  agent  with 
very  poor  external  sensor  capabilities  may  best  move  randomly  through  the  environment, 
and  depend  on  its  Basic  Reactive  Patterns  to  keep  it  out  of  trouble.  Eventually  though, 
our  robot  would  run  out  of  energy.  The  high  probability  for  this  catastrophic  event  is  due 
to  our  design  omission  that  does  not  provide  the  robot  an  opportunity  to  acquire  energy 
during  any  FAP. 

4.11.1.  Motivation  is  Necessary  but  Insufficient  for  Reliable  Survival 
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By  responding  appropriately  to  the  five  basie  motivators,  the  agent  may  survive  transient 
ehallenges  to  its  homeostasis  brought  on  by  the  exeeution  of  any  of  its  seven  Fixed 
Aetion  Patterns,  but  it  would  yet  tend  to  be  subjeet  entirely  to  the  fiuetuations  in  its 
external  environment.  One  meehanism  that  nature  has  sueeessfully  employed  to  reduee 
this  environmental  subjugation,  is  to  employ  distanee  sensors  and  assoeiate  subtle 
ehanges  in  the  external  environment  with  signifieant  eonsequential  ehanges  in  the 
internal  environment.  Upon  deteetion  of  those  subtle  changes  in  environmental  cues  the 
agent  can  invoke  a  reactive  process  that  either  avoids  or  approaches  the  environmental 
cue.  Those  cues  that  are  associated  with  events  that  restore  or  maintain  homeostasis  are 
fortunate  for  the  agent.  Those  cues  associated  with  events  that  do  not  must  be  avoided, 
otherwise  those  events  will  tend  to  terminate  or  exterminate  the  agent.  Therefore,  we 
must  provide  sensors  of  the  external  environment  that  will  detect  with  sufficient 
sensitivity  the  subtle  changes  (the  cues)  that  will  predict  significant  change  to  the  robot’s 
internal  environment,  and  we  must  provide  a  mechanism  by  which  the  robot  can 
determine  the  most  appropriate  way  to  respond  to  those  external  events. 

4.11.2.  Sensors  of  the  External  Environment 

Certain  sensors  that  monitor  conditions  in  external  environments  are  installed  on  the 
vehicle.  These  are  four  IR  short-range  proximity  sensors,  four  mid-range  SONAR,  four 
color  video  cameras,  four  acoustic  microphones,  and  two  RF  transceivers^^.  Table  3  lists 
the  external  sensors  and  possible  low-level  uses  of  the  available  information. 


Sensor 

quantity 

locations 

Range/ sensitivity 

Applications 

SONAR 

4 

Core  face 
plates/pods 

12  <  r  <  48  inches 

Distance  to  obstacles/leverage 
points 

IR 

4 

Core  face 
plates/pods 

<12  inches 

Distance  to  obstacles/leverage 
points;  presence  of  warm 
objects 

Video 

4 

Core  face 
plates 

Color  of  objects;  object 
distances  from  optic  flows 

Stereo 

Audio 

4 

Core  side 
plates 

>XdB 

Relative  location  of  activity 

RF 

2 

Core  top 
plates 

1000  feet 

Direction  to  OCU; 
communication  with  the 
director 

Table  3,  Sensors  of  the  External  Environment 


The  basic  purpose  of  these  external  sensors  is  prediction.  To  improve  upon  its 
homeostatic  mechanisms,  the  robot  may  use  its  external  sensors  to  predict  the  different 
conditions  that  it  will  encounter  during  its  movements.  We  have  noted  that  the  robot’s 


The  sensor  side  of  the  RF  transeeiver  is  the  reeeiver  that  aeeepts  (senses)  eommunieations  from  the 
operator  eontrol  unit. 
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movement  through  the  external  environment  engenders  eertain  risks.  Sueh  risks  are 
primarily  related  to  eollisions  and  to  loss  of  eontaet  with  leverageable  surfaees  (e.g.  falls). 
The  external  sensor  information  then  should  presage  those  hazards.  Also,  the  movement 
of  the  robot  may  inerease  its  likelihood  of  being  reeharged.  The  external  sensors  should 
deteet  the  eritieal  environment  features  that  are  assoeiated  with  an  energy  souree^°. 
Similarly,  movement  itself  is  a  homeostatie  motivator,  thus  the  external  sensors  should 
provide  information  that  will  indieate  a  traversable  pathway  (that  is,  one  that  does  not 
impede  movement). 

The  robot  has  little  eontrol  over  its  external  environment,  yet  its  movement  within  that 
environment  ean  ehange  the  impaet  that  the  environment  might  have  upon  it.  For 
example,  the  external  sensors  might  deteet  a  looming  objeet  and  the  robot  eould  prediet  a 
possible  eollision.  The  robot  eould  move  out  of  the  way  using  similar  behavioral 
strategies  to  those  that  it  would  employ  had  the  eollision  been  a  result  of  its  own  motion 
through  a  statie  environment.  Its  avoidanee  of  the  looming  objeet  might  preserve  its  own 
physieal  integrity,  but  have  no  effeet  upon  the  trajeetory  of  the  looming  objeet. 

Earlier  in  our  diseussion  of  the  Fixed  Aetion  Patterns  I  indieated  how  the  different 
patterns  eould  be  invoked  by  aetivity  in  the  interoeeptors.  Ideally,  the  exteroeeptors  will 
provide  predietive  information  that  ean  be  used  to  invoke  the  transformations  among  the 
Fixed  Aetion  Patterns  in  advanee  of  the  interoeeptor  triggers.  In  both  oases,  the  ohanges 
in  behavioral  patterns  should  be  appropriate  for  the  oonditions  in  the  external 
environment,  but  in  the  seoond  ease,  the  robot  eould  antioipate  ohanges  in  the  external 
environment  and  prepare  for  them.  This  eould  reduoe  errors  and  inerease  the  speed  of 
aetivity. 

In  a  oompetent  looal  eontrol  prooess,  the  Fixed  Aetion  Patterns  should  vary  with 
environmental  oonditions,  deteoted  by  the  external  sensors,  and  modified  both  by 
additional  external  sensor  information  and  by  the  internal  sensor  information  that 
oontinually  attempts  to  optimize  homeostasis.  For  example,  the  robot  may  deteet  an 
approaohing  objeet  through  its  stereo  audio  and  video  sensors.  The  robot  eould  move 
away  from  the  objeet  provided  that  it  had  adequate  energy  reserves  and  a  navigable  path 
to  follow.  If  the  range  sensors  indieated  that  the  path  ahead  of  the  movement  was  elear, 
the  robot  may  initiate  and  eontinue  in  a  FAP-R.  If  the  SONAR  indieated  an  obstaele 
ahead,  the  robot  eould  turn  in  the  direetion  of  the  elearest  path  as  indieated  by  its  side 
looking  SONAR  and  the  optie  flow  from  the  peripheral  fields  of  its  forward-looking 
video.  If  the  rearward  sensors  indieated  a  progressive  pursuit,  and  the  forward  sensors 
indieated  a  proximate  obstaele,  the  robot  eould  shift  to  FAP-W,  and  attempt  to  walk  over 
the  obstaele.  If  the  obstaele  caused  a  total  tilt  greater  than  approximately  30  degrees,  the 
robot  could  initiate  a  FAP-S,  but  if  the  total  tilt  angle  was  greater  than  approximately  45 
degrees  the  robot  could  suspend  pod  rotation  in  the  extended  position  and  then  continue 
to  ascend  or  descend  in  the  frozen  FAP-U  pattern.  However,  if  the  obstacle  turned  out  to 
be  steep  but  short,  the  robot  could  suspend  FAP-U  when  the  core  was  maximally 


For  example,  as  the  human  operator  is  most  likely  to  be  assoeiated  with  energy  reeovery,  the  robot  eould 
assoeiate  the  features  indieative  of  the  presenee  and  loeation  of  a  human  operator  with  its  energy 
motivation,  and  orient  to  those  features  when  energy  reserves  were  low. 
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elevated,  and  then  shift  to  FAP-R  to  move  without  further  undulations  but  with  a  higher 
perspeetive  over  the  terrain.  Tilt  of  the  video  eameras  eould  be  aeeomplished  by 
differentially  rotating  the  pods  in  FAP-U. 
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5.  Learning  and  Adaptation 

5.1.  Motivators  Protect,  Prioritize,  and  Reinforce  Behavior 

The  motivators  of  the  five  Basie  Reaetive  Patterns  ean  serve  many  funetions  in  the 
eontrol  of  behavior.  Not  only  do  motivators  trigger  and  govern  reaetive  behaviors  that 
provide  immediate  proteetion  for  the  agent,  they  ean  also  serve  as  mediators  to  determine 
whieh  of  many  eompeting  behaviors  are  seleeted  for  expression,  and  they  ean  serve  as  the 
eriteria  for  the  aequisition  of  new  behaviors.  We  have  already  seen  examples  of  the  first 
two  roles,  I  will  next  address  the  meehanisms  of  learning  and  of  deeision-making,  and 
explain  the  third  role  for  our  motivators  in  the  eontrol  of  behavior. 

5.2.  Learning  Enables  Prediction 

It  is  axiomatie  that  the  measure  of  sueeess  for  learning  (long-term  adaptation)  is  the 
restoration  or  maintenanee  of  homeostasis.  Learned  behaviors  are  appropriate  when  they 
promote  the  welfare  or  survival  of  the  agent,  whieh  are  possible  only  under  homeostasis. 
For  our  agent,  the  Novel  UGV,  survival  may  be  determined  by  the  availability  of  energy, 
by  the  eontinued  operation  of  its  hardware  and  software,  and  by  its  utility  to  the  human 
operators.  When  utility  disappears,  the  agent  is  subjeet  to  the  trash  heap.  When  energy 
dissipates,  or  when  funetionalities  of  hardware  or  of  software  eease,  the  same  trash  heap 
awaits.  The  learning  objeetives  then,  from  the  perspeetive  of  the  agent,  should  be  to 
maintain  its  energy  reserves,  keep  itself  together  and  funetional,  and  meet  the  needs  of  its 
user.  The  reader  may  note  that  this  last  objeetive  is  something  new  eompared  to  the  five 
basie  motivators  diseussed  earlier.  What  will  make  this  new  objeetive  possible  is  learning 
and  long-term  memory. 

The  external  sensors  provide  information  on  the  environment  that  ean  be  used  both  to 
prediet  a  homeostatie  eatastrophe  and  to  prediet  behavioral  alternatives  that,  if  taken,  will 
avoid  eatastrophe.  Learning  is  the  deviee  used  by  adaptive  natural  agents  to  prediet  the 
eonditions  in  the  external  environment  that  will  have  an  impaet  on  the  internal 
environment  and  ehange  homeostasis.  By  reaeting  to  the  predietions  of  these 
environmental  eonditions,  in  advanee  of  their  impaet  on  the  internal  environment,  an 
agent  is  more  likely  to  maintain  its  homeostasis.  We  will  emulate  this  deviee  in  our 
artifieial  agent,  the  Novel  UGV,  to  provide  it  with  a  similar  advantage. 

We  now  deseribe  the  learning  meehanism  that  will  assoeiate  the  information  available 
from  sensors  of  both  the  external  and  internal  environments,  predieting  their  homeostatie 
eonsequenees,  and  direeting  future  behavior  to  avoid  or  approaeh  those  external  faetors 
given  the  eurrent  internal  state. 

5.3.  Classical  Conditioning 

During  performanee  of  a  Fixed  Aetion  Pattern,  the  Basie  Reaetive  Patterns  will  modulate 
the  motor  eommands  aeeording  to  rules  implemented  in  the  Fixed  Conneetion  Matrix. 
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These  rules  are  analogous  to  the  uneonditioned  stimulus-uneonditioned  response  pairings 
of  elassieal  or  Pavlovian  eonditioning.  When  the  robot  is  able  to  pereeive  features  of  the 
external  environment  through  its  distanee  sensors,  this  information  beeomes  available  for 
assoeiation  with  the  uneonditioned  response.  During  movement,  the  eore  aeeelerometers, 
pod  pressure  sensors,  and  faeeplate  pressure  sensors  provide  the  major  uneonditioned 
stimuli  to  support  (faeilitate)  a  eonditioned  response  of  features  from  the  distanee 
sensors.  After  eonditioning  (the  repeated  eo-oeeurrenee  of  the  internal  and  external 
events),  the  features  from  the  distanee  sensors  invoke  a  response  similar  to  the 
uneonditioned  response  but  in  absenee  of  the  event  that  originally  produeed  it.  The 
elassieal  eonditioning  paradigm  is  diagrammed  in  Figure  12.  In  the  sequenee  of  events 
during  eonditioning,  the  external  event  usually  preeedes  the  internal  event  (a  likely 
happening  beeause  the  external  sensor  is  a  distanee  sensor),  but  the  reeord  of  the 
oeeurrenee  of  the  external  event  persists  if  not  the  event  itself  When  the  internal  event 
oeeurs  it  evokes  a  predietable  response  to  restore  homeostasis.  The  persistent  traee  of  the 
external  event  beeomes  assoeiated  with  the  response  evoked  by  the  internal  event 
aeeording  to  the  meehanism  of  activity  dependent  facilitation. 

feature  from  interoeeptor  response 


Figure  12,  Simplifled  Classical  Conditioning  Paradigm 


5.4.  Activity  Dependent  Facilitation 

A  general  learning  law,  known  as  aetivity  dependent  faeilitation  (Kandel  and  Hawkins, 
1992),  approximates  elassieal  eonditioning  and  is  useful  in  determining  the  eontributions 
of  a  partieular  input  through  its  modifiable  eonneetion  to  an  integrating  element 
preeeding  an  output  deeision.  The  law  is  as  follows: 

A  w  =  G  *  ((z/  e)  *  w)  *  (a  *(S  -  m)  *(C  -  w)  -  m  *(w  -  e)) 

where  w  is  its  eurrent  eonneetion  strength,  z  is  the  aetivity  on  the  input  element  in 
question,  e  is  the  sum  of  inputs  from  all  eooperating  elements  to  the  integrating  element 
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(prior  to  their  filter  by  the  w  veetor),  a  is  the  total  aetivation  of  the  integrating  element 
(equivalent  to  the  produet  of  the  e  veetor  and  the  w  veetor),  S'  is  a  eonstant  representing 
the  maximum  permissible  sum  of  weights  eonneeting  to  any  one  element,  m  is  the 
eurrent  sum  of  weights  making  eontaet  with  the  integrating  element,  C  is  a  eonstant 
representing  the  maximum  permissible  weight,  c  is  a  lower  limit  on  the  weight  to  prevent 
it  from  disappearing  eompletely  if  rarely  used,  G  is  a  eonstant  =  1  /  (S*C).  When  both  z 
and  a  are  present,  w  is  inereased,  but  when  z  appears  alone,  w  is  deereased. 

The  infiuenee  of  the  uneonditioned  stimulus  in  the  above  learning  law  is  ineorporated 
into  the  sum  of  inputs  on  the  integrating  element.  The  eonneetion  weights  for  the  UCS 
are  strong,  not  modifiable  over  the  short  term,  and  reliably  invoke  an  output  deeision  in 
the  absenee  of  any  other  eooperating  inputs. 

The  use  of  this  law  permits  assoeiations  among  previously  ineffeetual  feature  veetors,  so 
that  several  layers  of  eonditioning  ean  oeeur.  A  general  method  of  elassieal  eonditioning 
using  the  above  learning  law  and  that  provides  for  the  evolution  of  behavioral  sequenees 
is  sehematized  in  Figure  13.  The  method  ineludes  an  input  field  of  the  observed  sensor 
patterns,  short-term  storage  of  the  history  of  those  patterns  (analogous  to  short-term 
memory),  an  output  field  of  the  predieted  pattern  that  should  aeeompany  the  result  from 
the  assoeiated  motor  eommand,  a  eomparator  of  the  expeeted  and  observed  patterns,  a 
field  to  temporarily  store  the  resulting  errors,  and  an  assoeiation  matrix  of  the  input 
history  with  the  eurrent  input,  the  eurrent  error  and  the  next  motor  eommand.  Feedbaek  is 
eompleted  through  the  external  environment.  In  Figure  13,  only  two  proeessing  elements 
with  all  of  their  eonneetion  are  shown  in  eaeh  field  for  elarity.  The  aetual  numbers  of 
proeessing  elements  in  the  different  fields  depends  upon  the  resolution  of  the  sensory 
field,  the  eomplexity  of  the  effeetor  (motor)  system,  and  the  resolution  and  eomplexity  of 
the  feature  deteetors.  During  eonditioning,  the  UCS  for  matrix  A  is  a  eollateral  from  the 
Base  Reaetive  Pattern  that  is  eurrently  in  effeet.  The  UCS  for  matrix  B  is  the  eurrent 
sensory  input,  and  the  UCS  for  matrix  C  is  the  eurrent  error.  In  eaeh  ease,  the  UCS  is  the 
event  to  be  predieted.  Using  an  algorithm  similar  to  the  model  in  Figure  13,  a  predieted 
motor  response  will  exeeute  in  advanee  of  the  original  BRP. 


40 


3/22/2003 


Learning  Mobility 


oldest  features  old  features  current  features 


error  in  predicted  observed  from  to 

expectation  sensor  input  sensor  input  motor  motor 


Figure  13,  General  Model  for  Classical  Conditioning  of  Perceptual  Motor  Sequences 

In  classical  conditioning,  novel  information  from  the  external  environment  aequires  the 
strength  to  evoke  responses  that  already  exist  in  the  agent’s  repertoire  and  are  appropriate 
for  the  general  conditions  that  the  novel  information  predicts.  Additional  information  on 
the  applieation  of  this  learning  model  is  available  in  Blaekbum  and  Nguyen  (1994). 

5.5.  Operant  Conditioning 

The  post-hoe  appropriateness  of  any  partieular  behavior  is  determined  by  factors  that 
change  the  sensor  values,  and,  in  effeet,  indieate  the  ehange  in  probability  of  eatastrophe. 
Our  seeond  axiom  is  that  the  Basic  Reactive  Patterns  of  behavior  operate  to  reduce  the 
probability  of  catastrophe.  Thus,  the  Basic  Reactive  Patterns  show  the  Adaptive 
Behavioral  Controller  how  to  operate  in  order  to  restore  homeostasis.  That  is,  when  a 
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behavioral  aetion  initiated  by  some  eommand  from  the  Adaptive  Behavioral  Controller 
results  in  an  internal  sensor  reading  that  indieates  that  a)  aetivity  is  restored  to  its 
midrange,  b)  balanee  is  restored,  e)  eollisions  are  avoided,  d)  traek  eontaet  is  improved, 
and/or  e)  energy  reserves  and/or  energy  eonservation  are  improved,  we  ean  be  assured 
that  the  probability  of  a  eatastrophe  has  been  redueed.  These  sueeessful  behaviors  under 
the  given  environmental  eonditions,  should  be  remembered  so  that  they  ean  be  repeated 
whenever  the  appropriate  eonditions  reappear.  Similarly,  when  a  behavioral  aetion  results 
in  too  mueh  or  too  little  aetivity,  loss  of  balanee,  eollisions,  loss  of  traek  eontaet,  and 
depleted  energy,  that  behavior  should  also  be  remembered  and  inhibited  whenever  those 
prevailing  environmental  eonditions  reappear  . 

When  a  speeifie  motivator  is  out  of  homeostatie  bounds,  the  previously  assoeiated 
behaviors  should  be  primed  for  aetion.  An  effieient  way  to  aeeomplish  this  priming  is 
through  the  assoeiation  of  the  intereeptor  features  with  features  from  the  exteroeeptors. 
The  biasing  of  the  exteroeeptor  features  would  in  turn  bias  speeifie  behaviors  when  the 
environment  eontained  stimuli  eharaeterized  by  those  features. 

Our  third  axiom  is  that  all  aequired  behavior  for  our  robot  will  be  expressed  through  the 
modulation  of  the  seven  Fixed  Aetion  Patterns  using  pathways  in  parallel  with  the  five 
Basie  Response  Patterns  that  also  modulate  the  FAP. 

Reeall  that  the  FAP  are  generally  modifieations  of  the  FAP-R  that  is  exeeuted  while  the 
robot  is  in  its  normal  elosed  eonformation.  The  robot  expands  from  this  eonformation  to 
adapt  primarily  to  information  from  its  immediate  neighborhood  sensed  by  the  IR  and 
Whisker  sensors.  Reeall  also  that  the  BRP  generally  motivate  and  modify  the  FAP  based 
upon  information  from  the  sensors  that  are  monitoring  the  internal  environment.  Thus, 
through  our  external  infiuenees  on  the  stimuli  that  eontrol  the  BRP,  we  ean  intervene  and 
modulate  any  motor  eommand  assoeiated  with  any  FAP  during  performanee. 

Evidenee  that  learning  has  oeeurred  will  be  a  modifieation  of  a  FAP  that  is  not 
immediately  predieted  by  a  eomplete  knowledge  of  the  internal  and  external 
environment,  for  learning  will  have  permitted  the  robot  to  prediet  and  preeede  an 
environmental  event  with  a  unique  behavior. 

For  those  readers  familiar  with  the  biologieal  Learning  Literature,  we  will  implement 
here  analogues  of  instrumental  (or  operant)  eonditioning,  also  known  as  reinforcement 
learning.  Like  elassieal  eonditioning,  reinforeement  learning  requires  the  agent’s 
pereeption  of  environmental  information.  In  addition,  operant  learning  requires  an  aetion 
on  the  part  of  the  agent  separate  from  the  uneonditioned  response,  and  it  requires  some 
pereeivable  eonsequenees  of  that  aetion.  The  agent  ean  use  any  of  the  available  sensor 
information  for  the  assessment  of  the  environment  and  for  the  assessment  of  its 
behavioral  eonsequenees. 


The  exclusive-or  problem  that  is  solvable  by  a  three-layer  perceptron  is  an  example  of  a  two  choice 
paradigm  where  one  choice  must  be  inhibited  in  favor  of  the  alternative  under  the  co-occurrence  of  two 
otherwise  permissible  stimuli. 
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The  proeess  and  rules  of  reinforeement  learning  that  we  ean  implement  are  as  follows: 

■  Assess  the  internal  environment  (I) 

■  Assess  the  external  environment  (E) 

■  Perform  an  aetion  (A) 

■  Reassess  the  internal  environment,  and  determine  if  homeostasis  (H)  is  improved. 

■  If  H  is  improved,  then  assoeiate  faetors  I,  E,  and  A,  sueh  that  if  I  and  E,  then 
faeilitate  A. 

■  If  H  is  worsened,  then  assoeiate  faetors  I,  E,  and  A,  sueh  that  if  I  and  E,  then 
inhibit  A. 

The  above  rule  suggests  that  our  eontroller  have  a  speeial  eireuit  that  ean  inhibit  or  veto  a 
partieular  aetion.  This  eireuit  may  partieipate  in  the  assoeiation  rule  above  whenever 
homeostasis  is  disturbed  by  a  behavior.  The  rules  for  operant  eonditioning  are  graphieally 
represented  in  Eigure  14.  In  Eigure  14,  arrowheads  indieate  direetion  of  information  flow. 
The  line  terminating  in  a  dot  represents  inhibition.  The  dotted  eireles  represent  loeations 
of  aetivity  dependent  faeilitation  or  inhibition. 


FAP  element 


Figure  14,  Simplified  Operant  Conditioning  Paradigm, 


The  direetor,  serving  in  this  ease  as  the  supervisor  of  learning,  need  not  go  to  great 
lengths  to  manipulate  the  environment  in  order  that  speeifie  ehanges  in  homeostasis 
aeeompany  partieular  aetions  under  those  eonditions.  This  is  beeause  the  learning 
algorithm  above  guarantees  that  the  probability  of  oeeurrenee  of  a  partieular  aetion  in  the 
future  will  depend  upon  the  prevalenee  of  those  speeifie  internal  as  well  as  external 
environmental  eonditions.  In  the  future,  when  the  direetor  may  wish  to  see  that  partieular 
aetion  in  response  to  partieular  external  eonditions,  the  internal  eonditions  may  not  be 
present  with  suffieient  intensity  to  drive  the  aetion  above  behavioral  thresholds  or  above 
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competing  behaviors.  Thus  the  director  should  generally  not  mess  with  the  internal 
conditions  of  the  robot. 

The  locus  of  learning  in  our  control  architecture  of  Figure  5  is  the  box  labeled  plastic 
connections.  The  reader  will  notice  that  this  box  receives  input  from  the  internal  sensors, 
the  external  sensors,  and  the  box  containing  fixed  connections.  The  internal  architecture 
and  processes  of  the  two  boxes  containing  fixed  and  plastic  connections  respectively  are 
not  yet  fully  explained.  We  will  define  the  connectivity  within  these  boxes  based  upon 
the  principles  contained  herein  and  report  on  these  details  in  subsequent  documents. 

In  operant  conditioning,  novel  information  from  the  external  environment  acquires  the 
strength  to  evoke  responses  that  already  exist  in  the  agent’s  repertoire,  but  that  were 
previously  unrelated  to  any  intrinsic  motivators. 

5.6.  Fixed  Action  Patterns  Provide  the  Basis  for  New  Behaviors 

The  reinforcement  learning  algorithm  above  requires  that  an  action  take  place  before  the 
test  of  homeostasis.  Before  learning,  the  only  behaviors  of  which  the  robot  is  capable  are 
the  fixed  action  patterns.  Thus  the  robot  will  be  performing  a  fixed  action  pattern  when 
learning  initially  takes  place.  Learning  will  modify  the  particular  FAP  and  invoke  that 
modified  FAP  pattern  in  the  future  whenever  the  associated  internal  and  external 
environmental  conditions  are  present.  When  the  environment  is  novel,  the  agent  will 
default  to  previously  learned  behaviors  or  to  the  original  FAP,  depending  upon  the  degree 
of  novelty  and  motivation. 

After  some  modifications  of  the  seven  FAP,  the  repertoire  may  be  expanded  with  newly 
acquired  behaviors  by  building  upon  the  previous  action  patterns  that  are  invoked  by  the 
prevailing  environmental  conditions.  This  process  is  known  as  behavioral  shaping  and 
permits  learning  to  progress  without  destroying  previously  learned  patterns.  In  this  way 
the  repertoire  could  become  quite  complex,  depending  upon  the  agent’s  ability  to 
discriminate  the  necessary  behavior  specific  features  from  the  external  environment,  and 
upon  its  ability  to  respond  differentially  to  those  features. 

The  seven  FAP  exercise  all  of  the  mobility  degrees  of  freedom  of  the  robot  in 
coordinated  patterns  that  accomplish  mobility  under  a  variety  of  external  conditions.  The 
BRP  provide  transitory  modifications  to  the  coordinated  FAP  to  meet  certain  exigencies 
and  promote  homeostasis.  The  external  sensors  can  extend  through  classical  conditioning 
the  range  of  events  through  which  the  BRP  are  active.  With  any  given  external 
environment,  and  sufficient  range  of  sensitivity  in  the  external  sensors,  the  modifications 
to  the  FAP,  and  even  the  switching  among  them,  can  create  the  impression  of  the 
invention  of  novel  behavioral  patterns,  when  in  fact,  only  old  patterns  are  being 
rearranged. 

Classical  and  operant  conditioning  provide  for  one  additional  element  that  increases  the 
potential  for  behavioral  complexity  and  unpredictability  give  the  immediate  or  current 
environmental  conditions.  That  element  is  memory.  Memory,  however,  is  nothing  more 
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than  the  persistenee  of  the  assoeiations  between  the  features  of  the  external  environment 
and  the  features  of  the  internal  environment  established  through  elassieal  eonditioning  of 
the  BRP,  and  through  operant  eonditioning  of  the  FAP. 

5. 7.  The  Selection  of  One  Behavior  from  a  Repertoire  of  Acquired  Behaviors 

Following  learning,  the  oeeurrenee  of  any  desired  behavior  will  depend  upon  not  only 
sensor  readings  in  both  the  internal  and  external  environments,  but  also  upon  the 
eonfiguration  of  the  plastie  eonneetions,  their  eurrent  states,  and  their  transition 
probabilities.  The  eonfigurations  and  transition  probabilities  will  depend  upon  the 
learning  experienees,  and  the  eurrent  states  will  depend  upon  the  short-term  history  of 
aetivity. 

Learning  will  release  the  agent  from  a  striet  adherenee  to  environmental  eonditions.  The 
motivators  will  eontinue  to  modulate  behavior,  and  provide  the  fundamental  drive,  but 
the  direetion  of  the  behavior  will  depend  also  upon  previous  experienee.  The  eonfluenee 
of  the  state  of  motivation  and  previous  experienee  will  add  a  degree  of  unpredietability  to 
the  eontroller  relative  to  the  information  available  to  an  external  observer  with  knowledge 
of  only  the  reeent  history. 

Therefore,  the  agent  will  be  able  to  seleet  from  a  variety  of  potentially  useful  behaviors; 
the  degree  of  utility  will  depend  upon  experienee  and  the  present  eonditions.  The 
propensity  to  seleet  from  that  repertoire  will  also  depend  upon  experienee  and  the  present 
eonditions.  The  rules  that  govern  the  seleetion  and  maintenanee  of  a  fixed  aetion  pattern 
are  in  faet  the  same  rules  that  partieipate  in  the  seleetion  of  a  behavior  from  the  available 
repertoire.  Ideally,  the  robot  would  make  no  partieular  seleetion  unless  the  eonditions 
warranted  it,  but  errors  aequired  in  experienee  due  to  inappropriate  reinforeement  would 
surely  result  in  errors  in  later  performanee. 

When  external  eonditions  are  insuffieient  to  eontribute  to  goal-direeted  behavior,  a 
default  behavior  would  likely  emerge,  for  example,  random  seareh  . 

5.8.  Energy  as  a  Motivator  and  Shaper  of  Behavior 

The  presenee  of  several  eooperating  behavioral  eriteria  (see  the  list  in  Table  2)  permit  one 
or  more  eriteria  to  emerge  as  the  dominant  driver  of  behavior  and  determinant  of  learning 
depending  upon  the  eonditions  in  both  the  internal  and  external  environments.  When 
energy  reserves  are  high,  for  example,  the  deteetors  for  movement  may  emerge  as  the 
dominant  driver  and  not  only  seleet  drive-speeifie  behavior  but  also  determine  whieh 
novel  behavioral  patterns  are  faeilitated  and  whieh  are  inhibited.  When  energy  reserves 
are  low,  the  ehange  in  energy  reserve  eould  be  used  to  reinforee  behaviors  that  eontribute 
to  energy  aequisition,  even  if  those  might  violate  the  eriteria  for  movement. 


Random  search  could  be  appropriate  when  the  agent  is  still  acquiring  a  useful  repertoire  of  behaviors. 
Afterwards,  the  agent  may  best  meet  the  absence  of  requirements  (i.e.  operator  commands,  or  energy 
disparities)  with  quiescence  (but  also  with  action  readiness). 
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The  robot  monitors  its  energy  reserves  and  attempts  to  maintain  the  reserves  at  an 
adequate  level  for  eontinuous  operation. 

The  robot  ean  use  energy  reserves  and  energy  eonsumption  to  eontrol  behavior  and  the 
aequisition  of  new  behaviors.  The  replenishment  of  energy  should  be  a  strong  faeilitator 
of  behaviors  that  led  up  to  the  event  of  replenishment.  On  the  other  side  of  learning,  a 
high  usage  of  energy  during  eertain  behaviors  provides  a  eost  measure  for  those 
behaviors  that  ean  be  used  to  learn  the  avoidanee  of  those  behaviors  in  the  future. 

With  respeet  to  energy  aequisition  behaviors,  a  full  battery  eapaeity  provides  no 
partieular  stimulus  or  motivation  for  the  robot,  so  the  robot  eould  be  released  from  its 
foeus  on  energy  to  do  other  things.  A  low  battery  eharge,  however,  should  trigger  and 
maintain  a  set  of  behaviors  that  have  proven  through  experienee  to  inerease  the  battery 
eharge.  Short  of  plugging  itself  into  a  power  souree,  whieh  may  be  quite  distant  and 
unpredietable,  the  robot  must  first  aequire  other  environmental  features  that  are  most 
often  assoeiated  with  the  aequisition  of  energy.  In  the  biologieal  learning  literature,  these 
other  environmental  features  are  known  as  seeondary  reinforeers.  In  our  operational 
environment,  the  robot’s  direetor  will  most  likely  re-supply  the  robot  with  energy. 
Therefore,  from  the  perspeetive  of  the  robot,  its  direetor  eould  take  on  the  properties  of 
seeondary  reinforeement.  The  direetor  is  like  a  mother  to  the  robot,  and  we  may  look 
upon  that  relationship  in  very  similar  ways.  Rather  than  seeking  out  new  batteries 
direetly,  or  wall  soekets  to  plug  into,  or  even  a  eharging  station,  the  robot  may  seek  out 
its  direetor  with  the  expeetation  (implied)  that  the  direetor  will  do  whatever  is  neeessary 
to  reeharge  the  robot’s  batteries. 

An  energy-depleted  robot  is  probably  useless  for  most  of  our  applieations.  Thus  we 
should  arrange  for  the  robot  to  seek  out  the  human  direetor  whenever  its  energy  reserves 
dip  below  some  threshold.  The  threshold  should  be  high  enough  to  ensure  that  the  robot 
ean  get  baek  to  the  direetor,  or  at  least  to  assist  the  direetor  in  reeovering  the  robot^^. 

5.8.1.  The  Robot  Must  Attend  to  Its  Director 

Next,  we  must  address  the  question  of  how  the  robot  will  sense  the  presenee  or  identity  of 
its  human  direetor.  This  ean  be  done  in  several  ways,  but  eaeh  eomes  with  some 
eomputational  eost.  Humans  are  unique,  but  the  distinguishing  features  ean  be  subtle. 

Faee  reeognition  and  voiee  reeognition  may  be  useful,  and  teehnologieally  feasible.  But 
at  first  we  may  be  satisfied  with  only  the  robot’s  ability  to  deteet  where  any  human  is 
loeated  and  to  move  in  the  appropriate  direetion  to  make  physieal  eontaet. 

5.8.2.  The  OCU  as  a  Homing  Beacon 

All  humans  emit  IR  radiation,  and  usually  have  predietable  body  orientations  and 
proportions.  To  use  these  features  as  eues,  the  robot  must  yet  have  some  image  eapture 


One  mechanism  of  adaptation  is  to  vary  the  threshold  for  some  decision.  The  threshold  may  be  varied  by 
the  addition  or  subtraction  of  a  quantity  that  is  temporally  and  spatially  consistent  with  the  threshold  in  the 
decision  process. 
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and  analysis  capability.  But  this  too  is  beyond  the  current  resources  of  the  Novel  UGV. 
Instead  we  might  use  the  strength  (and  direction)  of  the  radio  transmission  between  the 
OCU  and  the  robot  as  the  director-defining  and  locating  information  for  the  robot.  In 
other  words,  the  active  radio  communication  between  OCU  and  robot  can  tell  the  robot 
that  a  director  is  accessible.  To  get  direction  from  the  OCU  radio  signal,  the  robot  may 
need  a  directional  antenna.  The  robot  may  change  its  orientation  with  respect  to  the  RF 
signal  until  a  maximum  is  found,  then  take  that  identified  heading.  For  positive  identity, 
the  OCU  might  send  an  encrypted  password  that  uniquely  identifies  the  director. 

5.8.3.  A  Process  to  Promote  the  Director  as  a  Secondary  Reinforcer 

Initially  the  robot  is  indifferent  to  its  director.  However,  because  of  the  basic  reactive 
pattern  E,  the  robot  is  not  indifferent  to  its  energy  reserve.  When  the  energy  supply  is 
low,  homeostasis  is  disturbed  and  the  director  has  an  opportunity  to  cause  the 
reinforcement  learning  algorithm  to  associate  that  low  energy  reserve  with  environmental 
information  that  is  unique  to  the  presence  and  location  of  the  director,  and  to  an  action 
that  would  bring  the  robot  around  on  future  occurrences  of  low  energy  reserves.  The 
following  learning  paradigm  might  be  employed: 

■  Begin  training  with  depleted  energy  reserves 

■  Use  the  OCU  on  low  broadcast  power 

■  Approach  the  robot 

■  Apply  external  current  to  recharge  the  robot’s  batteries. 

■  Turn  off  the  OCU  radio. 

■  Turn  off  the  external  charging  current. 

■  Repeat  the  process. 

Following  the  above  learning  protocol  that  should  install  the  director  as  a  powerful 
secondary  reinforcer  for  the  energy  motivator,  the  robot  should  have  a  propensity  to  seek 
out  the  director  whenever  its  energy  reserves  are  low.  In  this  scenario,  the  director  is 
synonymous  with  the  communications  signal.  As  processing  power  on  the  robot  is 
improved,  video  and  audio  information  can  also  be  used  to  identify  and  localize  the 
director. 

The  director  can  use  his/her  position  as  a  secondary  reinforcer  to  control  additional 
learning.  First  the  director  would  start  the  robot  in  a  reduced  energy  state.  This  would 
trigger  the  director  seeking  behavior.  Next  the  director  could  place  obstacles  in  the  path 
of  the  robot,  over  which  the  robot  must  learn  to  traverse.  The  increases  in  signal  strength 
(from  the  communications  signal,  the  audio  signal,  or  the  IR)  could  be  used  to 
intrinsically  reinforce  the  behaviors  employed  by  the  robot  in  its  traversal  . 

5.9.  Behavior  is  Multiply  Determined 


24 

Simultaneously,  the  eo-aetivation  of  the  BRP  will  eondition  the  aequisition  of  obstaele  negotiation 
behaviors  motivated  by  the  reinforeement  from  the  eommunieation  signal  strength  (see  the  seetion  on 
Learning  Mobility). 


47 


3/22/2003 


Learning  Mobility 


Because  energy  and  communication  are  critical  to  the  utility  of  the  robot,  the  propensity 
of  the  robot  to  roam,  explore,  return  to  the  director,  or  perform  some  other  routine  to 
improve  communications  may  be  dependent  upon  the  operational  conditions  of 
communication  and  energy  .  Some  of  the  possibilities  are  given  in  Table  4  below. 

From  the  Table  of  Conditional  Robot  Behaviors,  we  should  conclude  that  the  robot  will 
not  likely  return  to  the  director  until  it  has  run  quite  low  on  energy.  If  it  roams  until  it  has 
expended  approximately  1/3  of  it  energy  reserves,  then  we  should  expect  that  it  would 
take  another  1/3  to  get  back.  If  we  could  induce  the  robot  to  explore  or  perform  some 
other  task  objective  before  it  has  expended  1/3  of  its  energy  reserves,  then  the  energy 
threshold  for  the  switch  between  exploration  and  return  could  be  temporarily  reset 
accordingly,  permitting  a  longer  task  duration. 


Communications  Signal 

strong 

moderate 

weak 

Energy 

Reserves 

Strong 
(3/3-  2/3) 

Deploy,  roam, 
or  perform  task 

Explore,  or 
perform  task 

Climb  to  restore 
communications 

Moderate 

(2/3-1/3) 

Explore,  or 
perform  task 

Explore,  or 
perform  task 

Climb  to  restore 

comms 

Weak 

(1/3-0) 

Return  on 
comms  gradient 

Return  on 
comms  gradient 

Return  on 
comms  gradient 

Table  4.  Example  of  Conditional  Robot  Behavior 

As  noted  above,  the  director  can  become  an  attractor  for  the  robot.  Thus  the  robot  should 
have  a  strong  propensity  to  seek  out  its  director  when  its  energy  reserves  are  low.  The 
communications  gradient  tells  the  robot  where  its  director  is  located  in  general.  Once 
back  into  the  environment  of  the  director,  the  directional  information  from  the 
communications  signal  strength  may  then  be  supplemented  with  video  or  IR  information 
to  localize  the  director.  Director  voice  signals  detected  from  the  robot’s  stereophonic 
microphones  may  also  be  used  to  localize  the  director. 

5. 1 0.  Learning  Mobility 

Using  the  organic  sensors  for  acceleration  and  orientation  with  respect  to  gravity,  touch, 
track  pressure,  and  magnetometry,  the  robot  should  be  able  to  detect  its  movement,  its 
conformation,  and  its  orientation  with  respect  to  the  earth’s  magnetic  field,  and  with 
respect  to  objects  against  which  it  is  leveraged. 

The  robot  is  then  presented  with  an  objective.  I  have  described  how  one  such  objective 
can  arise.  That  is  the  orientation  and  directed  movement  toward  the  robot’s  human 
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Climbing  could  be  appropriate  when  communications  were  lost,  and  when  an  elevation  was  detected  to 
climb. 
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director  when  energy  reserves  are  low.  The  methods  of  movement  across  an  undefined 
terrain  are  left  to  the  robot  to  determine.  We  should  expect  that  a  number  of  fixed  action 
patterns  can  be  selected  in  response  to  simple  events  encountered  during  transit.  These 
alone  could  execute  some  routes  if  a  path  was  not  too  difficult  and  sufficient  time  was 
allotted.  Our  intent  in  providing  opportunities  for  learning  and  adaptation  utilizing 
distance  sensors  is  to  increase  the  probability  that  the  robot  will  negotiate  more  difficult 
paths  and  select  more  appropriate  routes  to  traverse  the  intervening  distances  in  shorter 
times  than  would  be  possible  by  a  more  random  method. 

5. 1 1.  Learning  Decision  Making 

As  the  purpose  of  the  robot  is  to  move  in  a  controlled  manner  through  its  environment,  it 
must  maintain  a  friction-supported  contact  with  some  leverage  points.  The  pod  plate 
pressure  sensors  provide  the  contact  information,  but  the  presence  of  friction  must  be 
assessed  by  other  means.  One  method  is  to  compare  track  velocity  with  accelerometer 
input  under  the  conditions  of  applied  force  from  the  track  motors.  Table  5  gives  some  of 
the  possible  outcomes. 


Applied  Force  on 
Track 

high 

low 

Track  Velocity 

high 

low 

high 

low 

Pod 

Accelerometer 

Output 

high 

adequate 
track  friction 

unlikely 

condition 

inadequate 
track  friction 
for  pod 
momentum 

inadequate 
track  friction 

low 

inadequate 
track  friction 
for  pod 
momentum 

adequate 
track  friction 
for  pod 
momentum 

inadequate 
track  friction 
for  pod 
momentum 

adequate 
track  friction 

Table  5,  Possible  Interpretations  of  Applied  Force  vs.  Observed  Motion 

The  robot  should  endeavor  to  keep  friction  adequate  for  the  current  combination  of  the 
load  and  the  applied  force.  Depending  upon  the  present  conformation  of  the  vehicle,  the 
robot  could  modify  its  conformation  to  improve  friction.  A  change  in  friction  would 
trigger  a  learning  algorithm  associating  the  previous  behavior  (conformation)  with  the 
perceived  environmental  conditions  and  the  behavior  that  resulted  in  the  new 
conformation  (based  upon  the  law  of  effect).  The  reader  may  recall  some  of  the  possible 
conformations  from  Figure  4. 
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6.  Operational  Implications 
6.1.  The  robot  must  always  be  “on”. 

The  first  requirement  for  an  independent  agent  is  that  it  must  remain  always  in  the  "on" 
state.  If  an  on/off  switeh  is  provided,  it  should  be  left  in  the  "on"  position,  and  only 
switehed  to  the  "off  position  when  major  eleetrieal  modifieations  are  required. 
Teehniques  for  "hot  swapping"  of  batteries,  motors,  and  eleetronie  eomponents,  that 
seleetively  turn  off  only  portions  of  the  robot  during  repairs  and  maintenanee,  may  make 
it  possible  to  keep  the  total  robot  system  always  in  the  "on"  state. 

It  may  seem  eounter-produetive  to  have  a  robot  that  is  always  “on”.  However,  this  does 
not  mean  that  the  robot  must  always  be  “on  the  move”.  Rather,  the  robot  must  be  always 
ready  to  move,  and  may  even  move  often  if  the  eonditions  are  appropriate  -  for  example 
the  robot  might  be  loeated  in  a  high  traffie  area  and  simply  be  “in  the  way”. 

Let  us  eonsider  some  of  the  other  advantages,  even  the  neeessity,  of  a  robot  that  is 
perpetually  “on”.  One  advantage  the  owner  of  the  robot  would  gain  by  permitting  the 
robot  to  remain  “on”  is  that  the  robot  eould  be  eontinuously  prepared  for  work.  Another 
advantage  is  that  an  “on”  robot  eould  spontaneously  beeome  aetive  and  exereise  its  skills. 
This  exereise  eould  improve  its  adaptation  to  its  work  environment  without  the  need  for 
operator  supervision  .  An  adapted  robot  may  then  be  able  to  work  independently  of 
human  eontrol. 

Operationally,  we  expeet  that  the  robots  will  be  often  in  the  eompany  of  humans.  Humans 
move  about  frequently.  They  do  not  like  to  lug  their  equipment  with  them  when  they 
move.  That  is  why  the  Army  is  asking  for  a  very  lightweight  mobile  robot.  But  regardless 
of  weight,  it  would  reduee  human  workload  eonsiderably  if  the  robots  eould  orient  to 
their  human  operators  and  keep  up  with  them  when  they  did  move.  To  aeeomplish  this, 
the  robot  must  be  able  to  operate  to  the  limits  of  the  human  operator’s  mobility  envelope. 
This  mobility  envelope  ineludes  some  of  the  following  : 

■  Rapid  waking  and  aetivation 

■  Traversals  on  a  planar  surfaee  at  rates  less  than  four  minutes  per  statute  mile. 

■  Traversals  for  one  hundred  feet  aeross  a  horizontal  four-ineh  beam. 

■  Vertieal  jumps  over  a  seven-foot  bar. 
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Two  disadvantages  of  a  periodically  active  robot  could  be  that  the  robot  might  wear  out  its  mechanical 
apparatus,  and  consume  more  energy  than  one  that  remained  mostly  in  the  "off  condition. 

Perhaps  the  best  way  to  get  a  good  impression  of  the  limits  of  the  human  mobility  envelope  is  to  watch 
the  series  of  events  at  world-class  track  &  field  and  gymnastics  competitions.  As  an  alternative,  and  one 
with  unequivocal  military  significance,  is  to  examine  the  mobility  required  by  a  basic  training  confidence 
or  obstacle  course.  These  courses  stress  the  strength  and  agility  of  young  recruits  without  the  benefit  of 
levers  or  cushions.  The  course  requirements  are  established  not  only  to  test  the  physical  fitness  of  the 
recruits,  but  also  to  assess  the  readiness  of  the  recruits  to  meet  actual  operational  conditions.  If  we  intend 
for  our  robots  to  accompany  the  operators  in  the  field,  the  robots  too  should  meet  those  strength  and  agility 
standards. 
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■  Vertical  climbs  of  a  thirty-foot  rope 

■  Vertical  climbs  of  a  fifty-foot  ladder  or  wall  with  foot  and  hand  holds. 

■  Horizontal  jumps  over  a  twenty-eight-foot  span. 

■  Crawls  beneath  or  slips  between  a  ten-inch  space. 

■  Swims  in  sea-state-one  for  one  mile. 

Of  course,  few  humans  can  perform  to  any  of  the  above  limits.  A  more  practical  standard 
for  robotic  support  vehicles  might  be  the  modal  performance  standards  of  the  human 
population.  But  some  developers  may  question  the  wisdom  of  applying  such  mediocre 
and  languorous  standards  to  a  species  with  fewer  constraints  and  greater  promise. 

6.2.  The  Control  of  Activity  and  Movement 

Historically,  our  concept  of  operation  of  machines  involves  machine  quiescence  until  a 
specific  instruction  is  given  by  the  operator  to  the  machine  to  commence  some  pre¬ 
defined  algorithm.  The  instruction  is  the  initiating  action  that  I  will  call  here  a 
provocation.  Biological  agents  also  respond  to  provocations  with  pre-planned  algorithms, 
but  the  variety  of  sources  of  the  provocation  can  be  quite  large.  In  addition,  the 
relationship  between  any  specific  provocation  and  a  particular  algorithm  often  is  acquired 
through  experience.  Prior  to  the  acquisition  of  significant  experience,  a  naive  agent  may 
respond  to  provocation  in  a  generalized  way  that  could  involve  simply  bolting  from  its 
position.  This  bolting  might  facilitate  an  escape  from  the  provocation.  In  this  case,  an 
observer  might  notice  that  the  provocation  was  external  to  the  agent,  a  loud  sound  for 
example.  On  other  occasions,  observers  notice  that  agents  appear  to  spontaneously  move 
about  with  no  obvious  provocation.  What  the  observers  cannot  notice  in  most  of  these 
cases,  however,  is  the  provocation  from  within  the  agent.  Hunger,  defined  by  a  drop  in 
energy  reserves,  is  a  common  provocation  that  motivates  agents.  At  other  times,  the 
random  firing  of  neurons  due  to  an  accumulation  of  intrinsic  and  extrinsic  sub-threshold 
noise  is  sufficient  to  initiate  overt  behavior. 

7.  Recapitulation 

Combat  operators  of  unmanned  ground  vehicles  report  that  mobility  is  a  serious  limiting 
factor  in  their  usefulness.  Because  of  their  low  stature,  small  unmanned  ground  vehicles 
rarely  can  scale  obstacles  of  heights  greater  than  10  inches,  regularly  stall  on  underbrush, 
and  frequently  fail  to  penetrate  dense  growths  of  trees,  all  of  which  admit  human 
operators  due  to  the  human’s  flexibility  and  multiple  degrees  of  motion  freedom.  It  is 
possible  to  add  motion  degrees  of  freedom  to  a  small  unmanned  ground  vehicle,  but  this 
creates  a  more  difficult  to  solve  problem  of  control  and  coordination. 

Most  existing  unmanned  vehicles  are  controlled  by  teleoperation.  The  human  operator, 
usually  through  a  joystick  and  radio  link,  directs  a  robot’s  single  degree  of  freedom,  or 
multiple  degrees  of  freedom  sequentially,  to  execute  some  maneuver.  Humans  require 
intensive  training,  often  taking  years,  to  manage  the  coordination  of  more  than  one 
degree  of  freedom  (for  example  -  in  playing  the  piano).  Because  of  this  human 
cognitive/performance  limitation,  the  use  of  small  unmanned  ground  vehicles  with 
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sufficient  degrees  of  motion  freedom  for  operation  in  tactical  situations  involving 
obstacle  dense  natural  terrain  will  likely  not  be  possible  without  competent  and  adaptive 
control  processes  resident  on  the  vehicle^^.  It  is  to  this  requirement  that  the  present  effort 
is  dedicated. 

The  present  approach  builds  upon  an  idea  that  is  at  least  several  hundred  millions  of  years 
old.  This  idea  is  that  agent  intelligence  must  develop  from  processes  that  promote  the 
survival  of  the  agent.  We  took  this  idea  and  first  built  a  robot  agent  (Figure  1),  adhering 
closely  to  existing  military  requirements  for  the  Future  Combat  Systems  (FCS)  Soldier 
Unmanned  Ground  Vehicle  (SUGV),  but  added  to  those  requirements  elements  necessary 
(but  yet  insufficient)  to  develop  an  intelligent  adaptive  controller.  The  elements  are 
multiple  degrees  of  motion  freedom,  and  sensors  of  critical  events  in  the  internal  and 
external  environments.  Needed  to  complete  the  elements,  and  of  which  the  present  effort 
intends  to  supply,  are  hard-wired  fixed  action  patterns,  semi-modifiable  basic  reactive 
patterns,  and  the  mechanisms  by  which  our  robot  agent  will  be  able  to  acquire  mobility 
and  survival  skills.  The  control  architecture  will  contain  these  elements  and  permit  the 
acquisition  of  novel  behavioral  patterns  by  the  robot  to  improve  its  adaptation  to  its 
environment. 

Nearly  all  practical  unmanned  vehicle  systems  to  date  depend  upon  human  decision 
making  during  mission  execution.  The  degree  of  dependence  is  proportional  to  the 
complexities  of  the  mission  and  of  the  operational  environment.  Developers  have  hoped 
to  reduce  human  involvement  by  automating  the  required  decision  making  processes  and 
embedding  them  in  the  vehicles,  but  this  makes  the  systems  fragile  under  uncertainties. 
We  intend  to  take  the  process  beyond  automation  to  permit  our  robotic  agent  to  make 
operational  decisions  and  learn  novel  behaviors  using  criteria  related  to  internal  state 
variables  associated  with  the  agent’s  health.  We  expect  that  this  approach  will  retain  the 
advantages  of  both  independent  activity  and  human  involvement  by  providing  the  means 
by  which  the  vehicle  can  evaluate  responses  to  novel  circumstances,  and  by  which  a 
human  operator  may  become  associated  with  certain  favorable  state  changes  of  the  agent, 
and  then  control  the  agent  through  biasing  certain  of  the  robot’s  intrinsic  goals,  and  by 
aperiodic  negation  of  the  robot’s  selected  means  to  those  goals,  rather  than  through  an 
operator’s  constant  exertion  to  drive  the  robot  to  the  operator’s  objectives.  This  approach 
will  result  in  a  very  different  kind  of  an  artificial  agent.  Because  our  aim  with  this  work  is 
to  lay  the  essential  foundation  for  all  higher-level  intelligent  processes  that  emulate  the 
biological,  when  successful  we  will  be  well-prepared  to  explore  methods  for  decision 
making  and  tactical  behaviors  in  the  agent  that  are  required  for  collaboration  with  other 
unmanned  systems,  and  with  humans. 


In  addition,  radio- frequency  communication  limitations  will  have  negative  consequences  for  remote 
control  of  unmanned  vehicles  in  complex  scenarios. 
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